This project has moved and is read-only. For the latest updates, please go here.

Identifier results when grammar is case insensitive

Oct 2, 2009 at 5:24 PM


I don't know if I would call this a bug, so I'm going to bring it up in a discussion thread first, but while I was working on a grammar which is case insensitive I notice an inconsistency in the results from an Identifier Terminal.  If the Identifier is alone in the expression and it contains Uppercase characters, the result is identical to the original text.  However, if the Identifier is part of a larger expression, all of the Uppercase characters become lower case.

I believe this could become an issue when you start working on sub grammars.  The most common example is HTML with in-line JavaScript. If my memory serves me correctly, JavaScript is case sensitive while HTML is not - XHTML is case sensitive though =).



Oct 4, 2009 at 6:03 PM

Sounds like an inconsistency at least, will look into this. I know it is important to do this properly, and will try to address it when it comes to symbol table implementation. The other interesting case, in addition to HTML/JavaScript, is a case-insensitive language (VB) with interop with .NET or some other case-sensitive language. Casing does not matter in the language itself, so 'x' and 'X' are the same, but exact casing becomes important when there is a call to external .NET function: System.Char.IsLetter(ch) - because .NET is case sensitive.

Will look into this more, thanks again for pointing this out


Nov 11, 2009 at 2:44 PM

I'm back to this issue, and I'm able to reproduce it. With GwBasic grammar and script like this:

1 x = xX

2 y = xX * 2

The first occurrence of xX shows as "xX" in parse tree, while the second one is "xx"

Will investigate and fix it.

Good catch! - thanks again!


Nov 12, 2009 at 1:15 AM

Thanks for following up.

Nov 14, 2009 at 4:33 PM


The issue is fixed in the latest drop. It turned out the problem was a bit deeper that one would expect, it was actually several problems spread out in several places. Now all identifiers show up "as is", with original casing even in case-insensitive languages. It became interpreter's responsibility to handle case variations if language is case insensitive - by creating case-insesitive dictionaries for local and global variables

thanks again