Brass - 1.0.5.3 update [06/02/2014]

One suite to code them all. An complete IDE and assembler for all your z80 projects!

Moderators: benryves, kv83

Post Reply
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Updated syntax.txt
CoBB wrote:
benryves wrote:Characters which would not be allowed would include any whitespace (tab or space), any character used in an operator, any character that is seen as punctuation ( ) [ ] , \, directive marker (# or .) and constant prefixes ($ % @). This is quickly off the top of my head, so if you can think of any further items that are definite no-nos, please tell me!
The tricky part is thinking in unicode again. You should exclude every kind of punctuation and white space, only allowing underscore and apostrophe (which is needed for shadow registers with your tokeniser; by the way, doesn’t that cause trouble with single-quoted literals?).
Good points. As for the apostrophe - my inner pedant gets grumpy with label names such as "DontRender", so yes - it would be nice to allow it. ;)
The tokeniser is sufficiently "bright" to allow for AF' (and not catch it as "AF and oh we're about to read a string") by only allowing string literals to be declared under certain situations (after an operator, after an opening bracket/parenthesis, at the start of a string, &c).
Case conversions can be handled differently depending on locale (check dotted and dotless i in Tukic languages). You should exclude that too.
I should exclude Turkish dotted and dotless "i"s? The thread's locale is set to the invariant culture, which is roughly based on English rules.
I’m not sure know if using . for module notation is the best solution, since it is already used for directives, and it’s probably not a good idea to allow them inside name literals. On the other hand, it’s the most appealing choice when it comes to visuals. Am I right to think that you’ll canonise label names internally as soon as they are parsed?
Directives can only start with a "." whereas they would only ever appear inside a label's path (so .x would never be a valid label name).

I've slightly revised the way numeric constants, string constants and label names are detected - see the updated file.
As for reusables, you could perhaps prefix them too with a special char, e.g. with backquote. That would take care of parsing difficulties.
A prefix is not enough, unfortunately, given that you can create a reusable label such as ++++. I do agree, though, that `++++` is a lot prettier than {++++}.
Also, I’m still not convinced that an unknown name should be allowed to be implicitly defined as a label to the current PC. Sure, you’re thinking in terms of a sufficiently clever IDE, but it’s always an advantage if errors can be caught at language level.
I've also changed the rules in the way you deal with expressions - they can either be evaluated (which just tries to return a value, and the label name on its own "trick" of turning it into "label=$" doesn't work), or executed (which will allow the trick but also requires that at least one assignment is made).

The only time an unknown label is created is using the = operator, though - an expression such as x+=1 needs to know what x is before it writes back to it (to perform the addition) so will fail anyway.
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

benryves wrote:I should exclude Turkish dotted and dotless "i"s? The thread's locale is set to the invariant culture, which is roughly based on English rules.
Well, if it takes care of case conversion, that’s okay. You know, in Turkic languages the lowercase-uppercase pairs are i-İ and ı-I, i.e. i and I can possibly mismatch given the wrong locale.
benryves wrote:A prefix is not enough, unfortunately, given that you can create a reusable label such as ++++. I do agree, though, that `++++` is a lot prettier than {++++}.
Why? You could simply introduce a special kind of token that can only hold reusable labels, which have a well-defined shape (a backquote followed by a run of pluses or minuses). If you want to use addition or subtraction after such a label, a whitespace would be mandatory, but this is not typical usage, so why force the user to type a closing character?

I’ll read the new version soon.
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Point taken on the reusable labels front.
Thanks for this, BTW. It's very useful. :)

Code: Select all

using System;
using System.Threading;
using System.Globalization;

public static class Program {
    public static void Main() {

        Console.WriteLine("Using Turkish rules");
        Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
        CompareIs();

        Console.WriteLine("Using culture-invariant rules");
        Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
        CompareIs();
	
	
    }

    private static void CompareIs() {

        string LowerDottedI =  "i";
        string LowerDotlessI = "ı";

        string UpperDottedI =  "İ";
        string UpperDotlessI = "I";


        Console.WriteLine(LowerDottedI.ToUpper() == UpperDottedI);
        Console.WriteLine(LowerDotlessI.ToUpper() == UpperDotlessI);

        Console.WriteLine(UpperDottedI.ToLower() == LowerDottedI);
        Console.WriteLine(UpperDotlessI.ToLower() == LowerDotlessI);

        Console.WriteLine(UpperDotlessI.ToLower() == LowerDottedI);
        Console.WriteLine(LowerDottedI.ToUpper() == UpperDotlessI);

    }
}

Code: Select all

Using Turkish rules
True
True
True
True
False
False
Using culture-invariant rules
False
False
False
False
True
True
When the locale is set to Turkish (tr-TR), ı-I match up and i-İ match up and i-I don't. When set to the invariant culture, they don't - but i and I do.
Do you suggest I just document it as "case insensitive as far as Latin characters go according to rules in Engish"? :|
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

Well, the situation is generally not that bad, but it might make sense to document certain quirks. A handy file for unicode and cases:

http://www.unicode.org/Public/5.0.0/ucd/CaseFolding.txt
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Well, the revised system (of breaking down source into tokens/expressions/commands) seems to work (I now have Brass assembling some Chip-8/SCHIP programs).

I'm thinking about merging parentheses and brackets with the previous token if the previous token is a label constant. [ ] would be used as indexers and ( ) would be used for "functions".

Part of this is to do with macros; I'm not really sure of the best way to do this, but I'm thinking that a macro plugin could hook in at various points (text level, token level, expression group level or command level) to transform the input. A directive plugin to respond to .define (for example) would also have to be written.

Directive plugins now also operate; so far I have .db, .if/.else/.endif and .rept/.loop working merrily.

It's all coming together far too slowly for my liking. :(
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Hey ben, could you make something that would let us split up "parameters" in macro's?
That really needs explaination: Suppose you have a macro that takes a register pair as "arguement", but somewhere along the code you have to use either half of the paired register. So when you have hl you have to use h or l. Then normally you have to make a long .if/.else block to split the register pairs, but if you could do: .define blabla(xx,yy) ld xh,xl \ ld yl,yh or something like it, which ofcourse should throw errors and fail when xx or yy arent register pairs.
Or instead of xh and xl, something like xx,1,1 (in string xx, starting at 1, 1 long) and xx,2,1 (in string xx, starting at 2, 1 long) which would support all kinds of strings.
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Due to a curious quirk (er, "bug"?) in the current parser, you can actually do this:

Code: Select all

.define increment(reg_h, reg_l) inc reg_h reg_l
increment(h,l) ; Assembles as "inc h l" = "inc hl"
.define increment({hl})increment(h, l)
increment(hl)
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Hey that's funny, but how about longer strings?
Thanx anyway, im going to rewrite some code..
Could it be official in newer versions though please?
Something a bit like a sub( function i mean?
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Spaces and tabs are mercilessly stripped from expressions outside of string constants, so as long as the instruction still matches, it'll work.

By that, I mean that in c hl wouldn't work, as it expects to match against "inc *". "inc h l" does as the * matches the h and l, which then have the space removed.

Yes, it's bad to have this silly side-effect. ;)

Brass 2 will support text-processing plugins (to allow for macros), so whereas a "sub" function wouldn't be built-in (nothing is built-in), I daresay it would be possible to write your own. The way that text-processing plugins will operate still hasn't been finalised, so you'll have to wait on that count I'm afraid.

In terms of updates on the project...

Image

Here's a sample of error reporting. Sorry about the colour scheme, but I think it's nicer to be like that instead of just giving you a line number and telling you there's a problem "somewhere" in it. All plugins can display their own errors - all you need to do is provide a message and pass the object with a problem. The assembler can then work out what the object is, where it is in the source, and format a nice error message.

Development continues... slowly.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Sounds good though, as does the possebility to write our own macro plugins. Looking forward to Brass 2! :D
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

King Harold wrote:Sounds good though, as does the possebility to write our own macro plugins.
Well, I need to think about that. Currently it parses the source code first, then it runs through and executes it in two passes. This is a problem - seeing as the macros have to run during the first stage, but are only defined during the stage where I execute directives, it won't work.
It might be enough to let directives execute during the initial text parsing stage - that way .include can also be used, for example, to insert source during that early stage.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

So then .define would let the macro plugin know about a macro in the first pass right?
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Yes, the macro DLL would have to provide at least two plugins - a directive plugin (to respond to .define) and a text processing plugin (to transform the input).

(When I say "plugin", I refer to a single feature - a single output format, a single directive, that sort of thing. You can wrap hundreds of these, if you so wish, into a single DLL).

I'm going to provide a macro_tasm.dll which provides TASM-style macros along with .ifdef and .ifndef directives. On that count, the main collection of "standardised" plugins appear in an assembly called mix_core.dll. This DLL also provides a bunch of helper functions for other plugin authors. For example, it provides .if/.else/.endif directives, and it exposes methods to let you add directives that work in a similar manner but a different name.

If I call the helper function for "If" support from my ".ifdef" plugin, I can then get seamless integration with .else and .endif without having to reinvent them myself (as .elsedef and .endifdef, for example).
User avatar
benryves
Maxcoderz Staff
Posts: 3087
Joined: Thu 16 Dec, 2004 10:06 pm
Location: Croydon, England
Contact:

Post by benryves »

Here's a bit of a poser.

Currently, the instruction counter and the position in the output file are decoupled.

By this, I mean that if I write $+=10 the instruction counter is advanced by ten units, but if I output anything it would output next to anything I'd previously output, not 10 bytes later.

This has some advantages. For example, I'm currently supporting a "Byte Translater" plugin that sits between the assembler and the output plugin, so that it can work on each byte that is sent to be output. A good example is unsquished TI-83/83+ programs, where each byte written is expanded to 2 bytes (two ASCII characters). That way, whilst for each byte the instruction counter is incremented once, the output merrily writes two.

Another obvious advantage is when you want to relocate blocks of code. Whilst the assembler and your collection of directives all assume that the instruction pointer is sitting merrily in some block of memory outside of your program, bytes are still written sequentially to the output.

The problem is when someone decides to use .org to skip bytes.

For the average TI programmer, I'd hope that wouldn't be much of a problem - .org $9D93 once at the top of their source and they're happy.

One potential workaround is to code the .org directive so that it writes filler bytes between the old instruction counter and the new. However, this would stop you from using .org <some earlier address>.

Any thoughts?
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

How many people use .org as a filler? It seems to me that using it as a PC-only relocator (which is what makes sense to me) wouldn’t break too much code out in the wild, and those few who play such tricks would have the mental capacity to figure out the new rules. Am I mistaken?
Post Reply