Mal-formed string literal

Aug 10, 2012 at 8:46 AM
Edited Aug 10, 2012 at 11:18 AM

Hello guys!

I implemented some grammar and managed to get rid of all conflicts. However, up to now it seems that one problem remains (I try to narrow it down as much as possible to keep focus on the problem):

My grammar supports characterliterals, such as 'c', and attributes, such as variableName'AttributeName. The problem now is two-fold, although I think solving one also solves the other... Consider the following simple line of code:


assert c='a' and c'OldVal='b'


(1) Whenever I want to declare an attribute (OldVal), the parser complains of a malformed string. Of course it does, There is a missing ' and OldVal is longer than one character.

(2) Before the assignment operator "=", the Grammar allows a simple identifier ("c" in the example) or an identifier's attribute ("c'OldVal"):

   = identifier 
   | identifier + "'" + identifier;

By looking only at the next token, the parser sees a valid identifier and reduces it to name. How can I implement a more sophisticated preview mechanism that, e.g., also takes into account the token after the preview token. An identifier followed by ' would unmistakably specify an attribute.


Note: The complete grammar is relatively complex, and ShiftIf() or ReduceIf() cannot be used - at least I wouldn't know how...

Any help or hint in the right direction is appreciated!


Aug 11, 2012 at 3:21 PM

Looks like your grammar is ambiguous at character/terminal level. For an expression like this (rearranged from your example)

assert 'a'=c and c'OldVal = b

when scanner hits the quote char after [assert], it is unclear what the quote is - start of string literal, or delimiter inside identifier attribute.

I think the easiest fix is the following.

Define a separate terminal "idWithAttr" based on IdentifierTerminal for identifier with attribute. Add the quote symbol as allowed char inside (but not as first char) - you can do it with constructor parameter. Hook to this terminal's ValidateToken event. Inside the handler, verify that variable-with-attr is correct (like there's only one quote symbol inside).

You can use this "idWithAttr" as is in your grammar. Or you can automatically convert it into the sequence of tokens "id + quote + id" - right inside ValidateToken handler. In this case just create multi-token and replace the current idWithAttr token (see sample code here: Note that you need to add idWithAttr to NonGrammarTerminals (just like comment terminals) in this case.


Aug 13, 2012 at 7:18 AM

Thank you very much for the quick reply, I will try your proposed solution!