Problems with basic plus rule

May 8, 2011 at 5:48 PM

I'm trying to create my first Irony grammar and am running into parsing errors. I want to parse the following string: "<region>a=b c=d", where a and c are "opcodes" and b and d are "values". However my parser falls over with an unexpected EOF. From the parse tree it gets everything right up until c which it interprets as a value instead of an opcode.

var region = new NonTerminal("region");
var setting = new NonTerminal("setting");
var opcode = new IdentifierTerminal("opcode");
var value = new IdentifierTerminal("value");
             
this.Root = region;
  
// <region> ::= "<region>" <setting>+
region.Rule = "<region>" + MakePlusRule(region, setting);
 
// <setting> ::= <opcode> "=" <value>
setting.Rule = opcode + "=" + value;

The parser then interprets my test string as:

<region> (Key symbol)
a (opcode)
= (Key symbol)
b (value)
c (value) <-- problem here, should be opcode
Unexpected end of file.

I'm sorry if this is a really basic newbie question, but any pointers in the right direction would be a great help.

thanks

Mark

 

Coordinator
May 8, 2011 at 5:51 PM

You use MakePlusRule incorrectly - the non-terminal on the left should be the same as the first argument. You should create new nonterminal settingList:

settingList.Rule = MakePlusRule(settingList, setting); 

region.Rule = "<region>" + settingList;

May 8, 2011 at 6:34 PM

Thankyou for the quick response. It works perfectly. I suspect I am going to have to ask a few more questions before I am finished with this grammar, but I'll do my best to solve them myself without pestering you too much.

Mark

May 8, 2011 at 6:54 PM

OK, two more questions on terminals if you have time.

First, I am using IdentifierTerminal for value when really value is more like a string literal but without any start and end delimiters. e.g. valid values might be 1.0, C#4 etc. Is there an appropriate built-in terminal type I can use?

Second, and this is where I fear things may get a bit complicated. This language actually allows spaces in values so for example:

name=John Smith age=50 birthplace=New York height=6'2"

would ideally parse to:

opcode: "name"
value: "John Smith"
opcode: "age"
value: "50"
opcode: "birthplace"
value="New York"

etc

To achieve this do I have to define my "value" as a non-terminal allowing component pieces, or is there a terminal that can cope with spaces in non-delimited string literals and work out where to stop?

thanks again for your time & this excellent library

Mark

May 8, 2011 at 7:39 PM

I have to say that this looks hard.  I am not an expert, but I find the string hard to read as a human, nevermind constructing the grammar for a parser to read it!  My eyes have to go backwards and forwards to work it out, so I imagine that a piece of computer code will have to do the same as well.

I assume the rules would be that opcodes never have spaces in them? (otherwise it realy would be totally ambigous).  I would therefore imagine that the opcodes are identified as having an "=" immediately afterwards, otherwise keep reading it as the previous value.  I have to say that I cant see a way of doing this in Irony (but then again I am not the expert!) because my experience has been that Irony needs make a decision as it reads it, not make a decision about the previous part when it reads the next part.  I say that based on my own experince of many hours getting rid of the conflicts in my own grammars.  So about the only way I can see how to do it is if you can guarantee that there is never whitespace characters between the opcode and the "=" and you define the opcode as a non-terminal that must end in the character "=" (somehow). Then Irony might stand a chance of knowing it is reading an opcode as it is reading it.

Alternatively, assuming that this is just a snippit of a bigger file to parse, then I would be tempted to simply let Irony detect the "<region>" and capture all the test that follows as a single string and split it up with a dedicated piece of custom code post-parsing.  I suppose this depends on whether or not you need the Opcode and value pairs to be ready split up in the parse tree?  Or do you simply need the parse tree to know it has read a region with an undefined number of opcode and value pairs in it?

 

May 8, 2011 at 7:47 PM

@wmh, yes the opcodes never have spaces in them, and yes it is far from the best thought out file format. I suspected that this might be a hard problem. I don't think I can guarantee there won't be whitespace between the opcode and the "=" (and yes I know that there would be a definite ambiguity if a space were to appear after an "="). I have been reading some documentation and was wondering if I could do something creative with "Token Filters" and somehow read ahead in the scanner stage allowing me to push string delimiters back in the right places before it got to the parser stage. However, there is only one sample of a token filter and it looks complex.

I could write my own state machine based parser for this language (I already have one half-written). I just came looking at tools like Irony to see if I could find a more elegant solution.

Mark

May 8, 2011 at 9:34 PM

"Token Filters": cant comment personally.  But then Irony definitely does seem to reward creativity and perseverance.  Let us know how you get on.

Good luck,

Will.

Coordinator
May 9, 2011 at 3:54 AM

I would agree with wmh in that this kind of stuff is better be done by a simpler, non-Irony code. You can just read settingList element as one string. Then do a split of the string on space, with space ignored (string.Split does this easily. After this a simple loop over splitted words would find all "=" symbols, and set symbol before as keys and the rest as values.

Roman 

May 9, 2011 at 10:40 PM

OK thanks for the help

Mark