What to do when lookahead is needed?

Jan 8, 2010 at 4:07 AM

Sorry if this has been covered before, but I'm new to Irony. I am trying to write a grammar for a domain-specific language that has two forms for method calls. The first is "call M(x,y,z)" where M is the method name and x,y, and z are the arguments to the method. Methods may have an arbitrary number of return values (instead of just one), so for binding the return values there is the second form: "call a,b := M(x,y,z)" (if M has two return values, of course I am not trying to capture that in the grammar). For LL parsers with backtracking, I would write the rule as:

callStmt.Rule = "call" (identifierList + ":=" | Empty) + methodCall;

(Where the production for the nonterminal methodCall recognizes strings like "M(x,y,z)".)

But with the LALR parser Irony produces, since both nonterminals identifierList and methodCall can begin with an identifier, I get parse errors for "call M(x,y,z)" because it grabs "M" as an identifier and then wants to see either a "," (the separator for the nonterminal identifierList) or the ":=" token.

What is the way to do this in Irony?

Thanks!

Mike

Coordinator
Jan 8, 2010 at 5:42 AM

The first question to ask is "does Irony report any grammar conflicts?" - if yes, then we have to work on conflicts, without parsing texts

Otherwise, if no grammar conflicts, then something should be wrong with the grammar. Can you please post the entier grammar if possible, or reduced version with function calls only. something runnable, to load into grammar explorer and give it a try

Roman

Jan 8, 2010 at 10:14 PM

Okay, thanks. It turns out I seem to have downloaded an old version of Irony. I need to reinstall it so I can get the grammar explorer working. I need it to work in VS2010 and to produce a language service for VS2010. Is there an update to Ben Morrison's example that works for VS2010? If not, will it work to download the Irony sources and just update the solution and project files to VS2010? Sorry if these questions are ill-formed: I'm still unsure on how Irony works.

Coordinator
Jan 8, 2010 at 10:52 PM

Not only no update for VS 2010, but I'm afraid the Ben's code wouldn't work with current version of Irony. You'd have to tweak it a lot, I'm quite sure. Sorry for inconvenience, it is still in my plans to introduce VS integration support directly in Irony, but so far could not get to it - was busy with other "core" functionality. My best regards to Herman - we spoke at LangNet, he probably remembers me. I'm watching CCI progress, but I decided to put off code generation for now and concentrate on simple intepreter for the first release. Good luck, and let me know if you need any help!

Jan 13, 2010 at 9:59 PM

Well, I'm making Ben's code work in VS2010, but of course it is still just a wrapper on the old MPF/VSIP interfaces, so no big deal there. Herman says hello too.

I'm still working on the grammar I need, but in the meantime, I think I can reproduce the problem I asked about at the beginning of this thread. I took the ScriptDotNetGrammar.cs that comes in the Irony_All.sln and added the following:

 

NonTerminal vList = new NonTerminal("vList");
vList.Rule = MakePlusRule(vList, comma, v);

I then modified the rule for Statement to add this alternative:

<font face="Consolas" size="2"><font face="Consolas" size="2">

|

</font></font><font face="Consolas" size="2">

 

</font>

"call" + (vList + ":=" | Empty) + Expr + semicolon

Now when I load the grammar into the Irony Grammar Explorer, I get a shift-reduce conflict:

Shift-reduce conflict. State S39, lookaheads [Identifier]. Selected shift as preferred action.

My understanding of LALR(1) parsing is that it would be able to make the decision of whether to take the Empty alternative in the "call" sequence if it sees a comma or ":=" as the next symbol in the input after having a "v" on top of the stack. I am surprised to see a conflict reported.

Any help would be great,

Mike

 

Coordinator
Jan 13, 2010 at 10:49 PM
Edited Jan 13, 2010 at 10:50 PM

The main suspect here is this subexpression in parenthesis - I advise against using constructs like this in general. Irony transforms the rules automatically to "normal" form, so it replaces this sub-expression with new NonTerminal. And I suspect this non-terminal brings the trouble - now parser has extra one decision to make (whether non-terminal is empty or not).

Try the following - change the rule into two "plus" sequences without this parenthesized expression inside:

"call" + vList + ":=" + Expr + semicolon | "call"Expr + semicolon;

Usually this helps in situations like this.

Jan 13, 2010 at 11:17 PM

Thanks! Yes, that fixes the problem. I have to admit that I am completely at a loss to understand the difference: it seems you have to figure out which alternative to take in the solution you provide and that decision is exactly the same as the decision as to whether that new nonterminal should allow the epsilon transition.

I also have a question about identity transforms. I think I've seen postings on the discussion list that indicate problems with writing "X.Rule = Y;". Is that still a problem? If so, does one fix it by writing "X.Rule = Y.Rule;"? (And if so, then one must make sure Y.Rule is initialized before this statement in the grammar's ctor.)

Coordinator
Jan 14, 2010 at 5:09 PM

Yes, two grammars are equivalent if we compare language(s) they generate, but they are not the same for the parser. It is easy to see with example. As I said, the expression in parenthesis is transofrmed into new nullable non-terminal (let's call it NT0) by Irony when it converts all productions to the normal form. So the actual productions are:

callStmt.Rule = "call" + NT0 + Expr + semicolon;

NT0.Rule = Empty | vlist + ":=";

Now the parser reads the line:

call x ....

- stops right after "call" and looks at "x" (as preview token), it has to make a decision: the "x" it sees - is it part of NT0 or is it part of Expr after NT0? In the first case it has to make shift to parse the NT0; in the second case it should create NT0 from empty production. It cannot decide based on the preview token - so the conflict. But if you switch to alternative, without NT0, then there's no such decision point, and everything is OK. As you see, this extra nullable non-terminal is the source of trouble.

As for identity transforms - no, it is no longer a problem. It had been fixed long ago.