MaxCoderz

for your 1 bit pleasure!

All times are UTC




Post new topic Reply to topic  [ 299 posts ]  Go to page Previous  1 ... 15, 16, 17, 18, 19, 20  Next
Author Message
 Post subject:
PostPosted: Fri 20 Oct, 2006 10:59 am 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Updated syntax.txt

CoBB wrote:
benryves wrote:
Characters which would not be allowed would include any whitespace (tab or space), any character used in an operator, any character that is seen as punctuation ( ) [ ] , \, directive marker (# or .) and constant prefixes ($ % @). This is quickly off the top of my head, so if you can think of any further items that are definite no-nos, please tell me!

The tricky part is thinking in unicode again. You should exclude every kind of punctuation and white space, only allowing underscore and apostrophe (which is needed for shadow registers with your tokeniser; by the way, doesn’t that cause trouble with single-quoted literals?).
Good points. As for the apostrophe - my inner pedant gets grumpy with label names such as "DontRender", so yes - it would be nice to allow it. ;)
The tokeniser is sufficiently "bright" to allow for AF' (and not catch it as "AF and oh we're about to read a string") by only allowing string literals to be declared under certain situations (after an operator, after an opening bracket/parenthesis, at the start of a string, &c).

Quote:
Case conversions can be handled differently depending on locale (check dotted and dotless i in Tukic languages). You should exclude that too.
I should exclude Turkish dotted and dotless "i"s? The thread's locale is set to the invariant culture, which is roughly based on English rules.

Quote:
I’m not sure know if using . for module notation is the best solution, since it is already used for directives, and it’s probably not a good idea to allow them inside name literals. On the other hand, it’s the most appealing choice when it comes to visuals. Am I right to think that you’ll canonise label names internally as soon as they are parsed?
Directives can only start with a "." whereas they would only ever appear inside a label's path (so .x would never be a valid label name).

I've slightly revised the way numeric constants, string constants and label names are detected - see the updated file.

Quote:
As for reusables, you could perhaps prefix them too with a special char, e.g. with backquote. That would take care of parsing difficulties.
A prefix is not enough, unfortunately, given that you can create a reusable label such as ++++. I do agree, though, that `++++` is a lot prettier than {++++}.

Quote:
Also, I’m still not convinced that an unknown name should be allowed to be implicitly defined as a label to the current PC. Sure, you’re thinking in terms of a sufficiently clever IDE, but it’s always an advantage if errors can be caught at language level.
I've also changed the rules in the way you deal with expressions - they can either be evaluated (which just tries to return a value, and the label name on its own "trick" of turning it into "label=$" doesn't work), or executed (which will allow the trick but also requires that at least one assignment is made).

The only time an unknown label is created is using the = operator, though - an expression such as x+=1 needs to know what x is before it writes back to it (to perform the addition) so will fail anyway.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri 20 Oct, 2006 12:50 pm 
Offline
MCF Legend

Joined: Mon 20 Dec, 2004 8:45 am
Posts: 1601
Location: Budapest, Absurdistan
benryves wrote:
I should exclude Turkish dotted and dotless "i"s? The thread's locale is set to the invariant culture, which is roughly based on English rules.

Well, if it takes care of case conversion, that’s okay. You know, in Turkic languages the lowercase-uppercase pairs are i-İ and ı-I, i.e. i and I can possibly mismatch given the wrong locale.

benryves wrote:
A prefix is not enough, unfortunately, given that you can create a reusable label such as ++++. I do agree, though, that `++++` is a lot prettier than {++++}.

Why? You could simply introduce a special kind of token that can only hold reusable labels, which have a well-defined shape (a backquote followed by a run of pluses or minuses). If you want to use addition or subtraction after such a label, a whitespace would be mandatory, but this is not typical usage, so why force the user to type a closing character?

I’ll read the new version soon.

_________________
The Independent Z80 Assembly Guide
Acelgoyobis
PindurTI


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri 20 Oct, 2006 1:23 pm 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Point taken on the reusable labels front.
Thanks for this, BTW. It's very useful. :)

Code:
using System;
using System.Threading;
using System.Globalization;

public static class Program {
    public static void Main() {

        Console.WriteLine("Using Turkish rules");
        Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
        CompareIs();

        Console.WriteLine("Using culture-invariant rules");
        Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
        CompareIs();
   
   
    }

    private static void CompareIs() {

        string LowerDottedI =  "i";
        string LowerDotlessI = "ı";

        string UpperDottedI =  "İ";
        string UpperDotlessI = "I";


        Console.WriteLine(LowerDottedI.ToUpper() == UpperDottedI);
        Console.WriteLine(LowerDotlessI.ToUpper() == UpperDotlessI);

        Console.WriteLine(UpperDottedI.ToLower() == LowerDottedI);
        Console.WriteLine(UpperDotlessI.ToLower() == LowerDotlessI);

        Console.WriteLine(UpperDotlessI.ToLower() == LowerDottedI);
        Console.WriteLine(LowerDottedI.ToUpper() == UpperDotlessI);

    }
}

Code:
Using Turkish rules
True
True
True
True
False
False
Using culture-invariant rules
False
False
False
False
True
True

When the locale is set to Turkish (tr-TR), ı-I match up and i-İ match up and i-I don't. When set to the invariant culture, they don't - but i and I do.
Do you suggest I just document it as "case insensitive as far as Latin characters go according to rules in Engish"? :|


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri 20 Oct, 2006 2:53 pm 
Offline
MCF Legend

Joined: Mon 20 Dec, 2004 8:45 am
Posts: 1601
Location: Budapest, Absurdistan
Well, the situation is generally not that bad, but it might make sense to document certain quirks. A handy file for unicode and cases:

http://www.unicode.org/Public/5.0.0/ucd/CaseFolding.txt

_________________
The Independent Z80 Assembly Guide
Acelgoyobis
PindurTI


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon 23 Oct, 2006 10:47 am 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Well, the revised system (of breaking down source into tokens/expressions/commands) seems to work (I now have Brass assembling some Chip-8/SCHIP programs).

I'm thinking about merging parentheses and brackets with the previous token if the previous token is a label constant. [ ] would be used as indexers and ( ) would be used for "functions".

Part of this is to do with macros; I'm not really sure of the best way to do this, but I'm thinking that a macro plugin could hook in at various points (text level, token level, expression group level or command level) to transform the input. A directive plugin to respond to .define (for example) would also have to be written.

Directive plugins now also operate; so far I have .db, .if/.else/.endif and .rept/.loop working merrily.

It's all coming together far too slowly for my liking. :(


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 1:55 pm 
Offline
Calc King

Joined: Sat 05 Aug, 2006 7:22 am
Posts: 1513
Hey ben, could you make something that would let us split up "parameters" in macro's?
That really needs explaination: Suppose you have a macro that takes a register pair as "arguement", but somewhere along the code you have to use either half of the paired register. So when you have hl you have to use h or l. Then normally you have to make a long .if/.else block to split the register pairs, but if you could do: .define blabla(xx,yy) ld xh,xl \ ld yl,yh or something like it, which ofcourse should throw errors and fail when xx or yy arent register pairs.
Or instead of xh and xl, something like xx,1,1 (in string xx, starting at 1, 1 long) and xx,2,1 (in string xx, starting at 2, 1 long) which would support all kinds of strings.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 2:01 pm 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Due to a curious quirk (er, "bug"?) in the current parser, you can actually do this:

Code:
.define increment(reg_h, reg_l) inc reg_h reg_l
increment(h,l) ; Assembles as "inc h l" = "inc hl"
.define increment({hl})increment(h, l)
increment(hl)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 2:12 pm 
Offline
Calc King

Joined: Sat 05 Aug, 2006 7:22 am
Posts: 1513
Hey that's funny, but how about longer strings?
Thanx anyway, im going to rewrite some code..
Could it be official in newer versions though please?
Something a bit like a sub( function i mean?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 2:46 pm 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Spaces and tabs are mercilessly stripped from expressions outside of string constants, so as long as the instruction still matches, it'll work.

By that, I mean that in c hl wouldn't work, as it expects to match against "inc *". "inc h l" does as the * matches the h and l, which then have the space removed.

Yes, it's bad to have this silly side-effect. ;)

Brass 2 will support text-processing plugins (to allow for macros), so whereas a "sub" function wouldn't be built-in (nothing is built-in), I daresay it would be possible to write your own. The way that text-processing plugins will operate still hasn't been finalised, so you'll have to wait on that count I'm afraid.

In terms of updates on the project...

Image

Here's a sample of error reporting. Sorry about the colour scheme, but I think it's nicer to be like that instead of just giving you a line number and telling you there's a problem "somewhere" in it. All plugins can display their own errors - all you need to do is provide a message and pass the object with a problem. The assembler can then work out what the object is, where it is in the source, and format a nice error message.

Development continues... slowly.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 3:04 pm 
Offline
Calc King

Joined: Sat 05 Aug, 2006 7:22 am
Posts: 1513
Sounds good though, as does the possebility to write our own macro plugins. Looking forward to Brass 2! :D


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 3:11 pm 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
King Harold wrote:
Sounds good though, as does the possebility to write our own macro plugins.
Well, I need to think about that. Currently it parses the source code first, then it runs through and executes it in two passes. This is a problem - seeing as the macros have to run during the first stage, but are only defined during the stage where I execute directives, it won't work.
It might be enough to let directives execute during the initial text parsing stage - that way .include can also be used, for example, to insert source during that early stage.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 4:07 pm 
Offline
Calc King

Joined: Sat 05 Aug, 2006 7:22 am
Posts: 1513
So then .define would let the macro plugin know about a macro in the first pass right?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed 25 Oct, 2006 4:34 pm 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Yes, the macro DLL would have to provide at least two plugins - a directive plugin (to respond to .define) and a text processing plugin (to transform the input).

(When I say "plugin", I refer to a single feature - a single output format, a single directive, that sort of thing. You can wrap hundreds of these, if you so wish, into a single DLL).

I'm going to provide a macro_tasm.dll which provides TASM-style macros along with .ifdef and .ifndef directives. On that count, the main collection of "standardised" plugins appear in an assembly called mix_core.dll. This DLL also provides a bunch of helper functions for other plugin authors. For example, it provides .if/.else/.endif directives, and it exposes methods to let you add directives that work in a similar manner but a different name.

If I call the helper function for "If" support from my ".ifdef" plugin, I can then get seamless integration with .else and .endif without having to reinvent them myself (as .elsedef and .endifdef, for example).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu 26 Oct, 2006 3:57 pm 
Offline
Maxcoderz Staff
User avatar

Joined: Thu 16 Dec, 2004 10:06 pm
Posts: 3064
Location: Croydon, England
Here's a bit of a poser.

Currently, the instruction counter and the position in the output file are decoupled.

By this, I mean that if I write $+=10 the instruction counter is advanced by ten units, but if I output anything it would output next to anything I'd previously output, not 10 bytes later.

This has some advantages. For example, I'm currently supporting a "Byte Translater" plugin that sits between the assembler and the output plugin, so that it can work on each byte that is sent to be output. A good example is unsquished TI-83/83+ programs, where each byte written is expanded to 2 bytes (two ASCII characters). That way, whilst for each byte the instruction counter is incremented once, the output merrily writes two.

Another obvious advantage is when you want to relocate blocks of code. Whilst the assembler and your collection of directives all assume that the instruction pointer is sitting merrily in some block of memory outside of your program, bytes are still written sequentially to the output.

The problem is when someone decides to use .org to skip bytes.

For the average TI programmer, I'd hope that wouldn't be much of a problem - .org $9D93 once at the top of their source and they're happy.

One potential workaround is to code the .org directive so that it writes filler bytes between the old instruction counter and the new. However, this would stop you from using .org <some earlier address>.

Any thoughts?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu 26 Oct, 2006 4:57 pm 
Offline
MCF Legend

Joined: Mon 20 Dec, 2004 8:45 am
Posts: 1601
Location: Budapest, Absurdistan
How many people use .org as a filler? It seems to me that using it as a PC-only relocator (which is what makes sense to me) wouldn’t break too much code out in the wild, and those few who play such tricks would have the mental capacity to figure out the new rules. Am I mistaken?

_________________
The Independent Z80 Assembly Guide
Acelgoyobis
PindurTI


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 299 posts ]  Go to page Previous  1 ... 15, 16, 17, 18, 19, 20  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB ® Forum Software © phpBB Group | DVGFX2 by: Matt