Good points. As for the apostrophe - my inner pedant gets grumpy with label names such as "DontRender", so yes - it would be nice to allow it.CoBB wrote:The tricky part is thinking in unicode again. You should exclude every kind of punctuation and white space, only allowing underscore and apostrophe (which is needed for shadow registers with your tokeniser; by the way, doesn’t that cause trouble with single-quoted literals?).benryves wrote:Characters which would not be allowed would include any whitespace (tab or space), any character used in an operator, any character that is seen as punctuation ( ) [ ] , \, directive marker (# or .) and constant prefixes ($ % @). This is quickly off the top of my head, so if you can think of any further items that are definite no-nos, please tell me!
The tokeniser is sufficiently "bright" to allow for AF' (and not catch it as "AF and oh we're about to read a string") by only allowing string literals to be declared under certain situations (after an operator, after an opening bracket/parenthesis, at the start of a string, &c).
I should exclude Turkish dotted and dotless "i"s? The thread's locale is set to the invariant culture, which is roughly based on English rules.Case conversions can be handled differently depending on locale (check dotted and dotless i in Tukic languages). You should exclude that too.
Directives can only start with a "." whereas they would only ever appear inside a label's path (so .x would never be a valid label name).I’m not sure know if using . for module notation is the best solution, since it is already used for directives, and it’s probably not a good idea to allow them inside name literals. On the other hand, it’s the most appealing choice when it comes to visuals. Am I right to think that you’ll canonise label names internally as soon as they are parsed?
I've slightly revised the way numeric constants, string constants and label names are detected - see the updated file.
A prefix is not enough, unfortunately, given that you can create a reusable label such as ++++. I do agree, though, that `++++` is a lot prettier than {++++}.As for reusables, you could perhaps prefix them too with a special char, e.g. with backquote. That would take care of parsing difficulties.
I've also changed the rules in the way you deal with expressions - they can either be evaluated (which just tries to return a value, and the label name on its own "trick" of turning it into "label=$" doesn't work), or executed (which will allow the trick but also requires that at least one assignment is made).Also, I’m still not convinced that an unknown name should be allowed to be implicitly defined as a label to the current PC. Sure, you’re thinking in terms of a sufficiently clever IDE, but it’s always an advantage if errors can be caught at language level.
The only time an unknown label is created is using the = operator, though - an expression such as x+=1 needs to know what x is before it writes back to it (to perform the addition) so will fail anyway.