Token Preview -based Semi-Automatic Conflict Resolution

(first version completed, see here: http://irony.codeplex.com/discussions/267217)

Suggested research/implementation project.

There is one very common type of reduce/reduce conflict that comes up in languages like c# and java.
I'll show it using a simple language as an example. The language recognizes a list of declarations, for ex:

    public readonly string SomeField;
    private int SomeProp {}; 
    public object SomeMethod(){};

Here is an Irony grammar for this language:

  public class ConflictResolutionTestGrammar : Grammar {
    public ConflictResolutionTestGrammar() : base(true) {
      var name = new IdentifierTerminal("id");

      var stmt = new NonTerminal("Statement");
      var stmtList = new NonTerminal("StatementList");
      var fieldModifier = new NonTerminal("fieldModifier");
      var propModifier = new NonTerminal("propModifier");
      var methodModifier = new NonTerminal("methodModifier");
      var fieldModifierList = new NonTerminal("fieldModifierList");
      var propModifierList = new NonTerminal("propModifierList");
      var methodModifierList = new NonTerminal("methodModifierList");
      var fieldDef = new NonTerminal("fieldDef");
      var propDef = new NonTerminal("propDef");
      var methodDef = new NonTerminal("methodDef");

      //Rules
      this.Root = stmtList;
      stmtList.Rule = MakePlusRule(stmtList, stmt);
      stmt.Rule = fieldDef | propDef | methodDef;
      fieldDef.Rule = fieldModifierList + name + name + ";";
      propDef.Rule = propModifierList + name + name + "{" + "}";
      methodDef.Rule = methodModifierList + name + name + "(" + ")" + "{" + "}";
      fieldModifierList.Rule = MakeStarRule(fieldModifierList, fieldModifier);
      propModifierList.Rule = MakeStarRule(propModifierList, propModifier);
      methodModifierList.Rule = MakeStarRule(methodModifierList, methodModifier);
      
      // That's the key of the problem: 3 modifiers have common members
      //   so parser automaton has hard time deciding which modifiers list to produce - 
      //   is it a field, prop or method we are beginning to parse?
      fieldModifier.Rule = ToTerm("public") | "private" | "readonly" | "volatile";
      propModifier.Rule = ToTerm("public") | "private" | "readonly" | "override";
      methodModifier.Rule = ToTerm("public") | "private" | "override";

      MarkReservedWords("public", "private", "readonly", "volatile", "override");   
    } 

Each statement type is defined the same way: modifiers list followed by typename, name, finally ending with either ";", "{}", or "(){}". The trouble is that modifiers are different for each statement type - share some keywords, but each is different. Naturally, we define different -List and -Modifier non-terminals for modifiers. And here comes the conflict: when parser sees the first word of a statement, like "public", it cannot know which modifierList it belongs to, and which non-terminal to reduce, out of 3.
Here's grammar error in S0, the same is in S1:
Reduce-reduce conflict. State S0, lookaheads: id public private readonly override. Selected reduce on first production in conflict set.

Here are reeduce items in state S0 that are in conflict:
    fieldModifierList -> · [id public private readonly volatile]
    propModifierList -> · [id public private readonly override]
    methodModifierList -> · [id public private override]

The idea for the fix is simple: make Parser run ahead and check the first token it finds out of the set [ ; ( { ]. If it finds ";" first out of 3, it is field definition, if { then it is property, and so on.
We need a way to provide parser with these hints, and let parser builder know that we're OK, the conflict is resolved.

(important: the implementation section was edited on 07/20/2011)
Implementation suggestion: implement a grammar hint TokenPreviewHint, with two parameters:
  • ComesFirst - a symbol to find first;
  • BeforeSymbols - before these symbols; we can use either array (params array) or space-delimited list in a single string.
When this hint is used in production in Shift position (not at the end), it is "shift-if" hint; correspondingly, when the hint is at the end of production, it is "reduce-if" instruction.
Two helper methods can be added, that return this hint:
  • ShiftIf(comesFirst, beforeSymbols); - to use in shift position.
  • ReduceIf(comesFirst, beforeSymbols); - to use in reduce position.
Note that the example we just looked at (modifiers list) is in fact a special case. When we create lists using MakePlusRule or MakeStarRule, we do not form the productions explicitly, these methods do this for us. So for this case, we need some additional way to "inject" these hints. Probably, some helper method ReduceIf on non-terminal that adds this hint to all productions in a non-terminal:

     fieldModifierList.ReduceIf(";", "; { ("); 
     propModifierList.ReduceIf("{", "; { ("); 
     methodModifierList.ReduceIf("(", "; { ("); 
The first parameter is a symbol to find first, out of symbols/terminals listed in the second parameter (as a space delimited string). Note that we need different ways to specify terminals - as string literals like shown above, or as references to terminal variables used in grammar definitions.
Another thing - we may need to provide extra symbol(s) marking when to stop searching so that preview does not go too far - in case we are parsing source with errors. Example: user is typing text in editor, missed one symbol, while highlighter is parsing the text - we do not want parser to run too far. In our case, the termination symbols might be " or ")";
Implementing this behavior is step one - when user knows how to resolve the conflict, and he provides explicit instructions. More difficult is next step: actually verifying that with provided hints the conflict is gone for all cases. And finally, it would be nice to have automatic advice: when user runs into conflict like this the first time, Irony automatically discovers the way to resolve it using this technique, and give an advise to the user, including the symbols to use in resulution.

Last edited Aug 2, 2011 at 9:24 PM by rivantsov, version 15

Comments

No comments yet.