Problem With Parser (Nearly Duplicate Productions)

Feb 25, 2010 at 11:48 PM

Hey Roman,

I have run into a problem with the Parser where it will not parse a perfectly valid expression.

First of all, here is the code for the grammars that are causing the problem:

public class TestGrammar1 : Grammar
{
   public TestGrammar1()
   {
      var number = new NumberLiteral("Number");

      var foo = new NonTerminal("Foo");
      var bar = new NonTerminal("Bar");
      var sum = new NonTerminal("Sum");

      this.Root = foo;

      foo.Rule = "Foo" + ToTerm("(") + bar + "," + sum + ")";
      bar.Rule = "Bar" + ToTerm("(") + number + ")";
      sum.Rule = "Sum" + ToTerm("(") + number + ")";

      MarkPunctuation("(", ",", ")");
   }
}

public class TestGrammar2 : Grammar
{
   public TestGrammar2()
   {
      var tg1 = new TestGrammar1();

      var number = new NumberLiteral("Number");

      var theRoot = new NonTerminal("TheRoot");
      var foo = new NonTerminal("Foo");
      var bar = new NonTerminal("Bar");

      this.Root = theRoot;
      theRoot.Rule = foo | tg1.Root;
      foo.Rule = "Foo" + ToTerm("(") + bar + "," + number + ")";
      bar.Rule = "Bar" + ToTerm("(") + number + ")";

      MarkPunctuation("(", ",", ")");
   }
}

I separated the grammars into two classes since that is the way my current (much, MUCH more complex grammars) are set up (with one using the other as a sub grammar).
I need the functionality to be able to parse an expression from the first grammar (in this case, TestGrammar1), as well as an expression from the second grammar
(in this case, TestGrammar2), without having to re-write the entire TG1 inside of TG2 (for maintenance purposes!).  Also, since they have conceptually
different functionality (when evaluating their ASTs), I would prefer that they remain in separate grammars rather than turning TG1 into a "snippet" root of TG2.

In this case, when parsing an expression using TestGrammar2, it should be able to parse expressions of the form:
Foo(Bar(5), Sum(7))   <-- Production from TestGrammar1
Foo(Bar(5), 7)   <-- Production from TestGrammar2

It can correctly parse the expression from TG1, but when it tries to do the one from TG2, it gives the following Parser Error:
(0:12) Syntax error, expected: Sum

Clearly, it is a valid production and should have no problem.

As far as I can tell, the problem seems to come from the fact that both Foo statements require a Bar statement as the first "parameter", but the NTs representing
the Bar statements are two separate NTs with the same production.

Brian

Coordinator
Feb 26, 2010 at 12:04 AM

I think your problem is in two versions of Number literals participating in the final Grammar #2. They look the same but they are distinct and different objects for parser. So just by accident Number#1 is the first in scanning list, it gets to parse the token "5", so after this the parser is directed into the first grammar's production (because number terminal is from the first grammar). You need to make NumberLiteral (and possibly other terms) in Grammar1 public fields, and reuse them in Grammar2 rules.

Mar 1, 2010 at 5:19 PM
Edited Mar 1, 2010 at 6:27 PM

Roman,

I was able to find two different ways to make it work.  The first is by "sharing" both the Bar NT and the open parenthesis key term (both of these are necessary):

public class TestGrammar1 : Grammar
{
   public NonTerminal Bar = new NonTerminal("Bar");
   public KeyTerm Key_LP = null;

   public TestGrammar1()
   {
      Key_LP = ToTerm("(");

      var number = new NumberLiteral("Number_1");

      var foo = new NonTerminal("Foo_1");
      var sum = new NonTerminal("Sum");

      this.Root = foo;

      foo.Rule = "Foo" + Key_LP + Bar + "," + sum + ")";
      Bar.Rule = "Bar" + Key_LP + number + ")";
      sum.Rule = "Sum" + Key_LP + number + ")";

      MarkPunctuation("(", ",", ")");
   }
}

public class TestGrammar2 : Grammar
{
   public TestGrammar2()
   {
      // Grammar.CurrentGrammar is switched to TG1!
      var tg1 = new TestGrammar1();
      // But we cannot switch it back -- it is read only!

      var number = new NumberLiteral("Number_2");

      var theRoot = new NonTerminal("TheRoot");
      var foo = new NonTerminal("Foo_2");

      this.Root = theRoot;
      theRoot.Rule = foo | tg1.Root;
      // TG1.ToTerm("Foo") is really called here
      foo.Rule = "Foo" + tg1.Key_LP + tg1.Bar + "," + number + ")";

      // NOTE: This "(" and tg1.Key_LP are separate KeyTerms
      MarkPunctuation("(", ",", ")");
   }
}

The ugly thing about this solution is that the open paren has to be shared (but oddly, "Foo", number, and the closed paren do not!).  I was confused at first
because it looked like both "Foo" and "(" were being passed into the ToTerm(...) function (whether explicitly, or as part of processing the '+' operator), so I
didn't understand why "(" had to be shared, but not "Foo"!

It turns out that "(" is passed to the ToTerm function of the Grammar that has the constructor that is currently executing, which leads to two different KeyTerms
being made for "(": one for TG1, and another for TG2 (since they cannot see each other's list of already defined KeyTerms).  HOWEVER, when "Foo" is passed
to ToTerm(...), it is passed to TG1's ToTerm(...) function, resulting in only one KeyTerm being created for "Foo" (since it was 'already defined in TG1').
This is due to the fact that the line:

var tg1 = new TestGrammar1();

causes Grammar.CurrentGrammar to change from TG2 to TG1.  Since Grammar.CurrentGrammar is readonly, I cannot change it back.

I can see things getting very complicated (and incorrect!) if more than one grammar is used as a "sub grammar".  (For example, if I added a third "parameter" to
Foo that was the root of TestGrammar999.)

The other "solution" to the problem is to use inheritance; TG4 inherits from TG3:

public class TestGrammar3 : Grammar
{
   protected NonTerminal m_Bar = new NonTerminal("Bar");

   public TestGrammar3()
   {
      var number = new NumberLiteral("Number_3");

      var foo = new NonTerminal("Foo_3");
      var sum = new NonTerminal("Sum");

      this.Root = foo;

      foo.Rule = "Foo" + ToTerm("(") + m_Bar + "," + sum + ")";
      m_Bar.Rule = "Bar" + ToTerm("(") + number + ")";
      sum.Rule = "Sum" + ToTerm("(") + number + ")";

      MarkPunctuation("(", ",", ")");
   }
}

public class TestGrammar4 : TestGrammar3
{
   public TestGrammar4()
   {
      // Grammar.CurrentGrammar is never changed
      var tg3 = this.Root;

      var number = new NumberLiteral("Number_4");
      var theRoot = new NonTerminal("TheRoot");
      var foo = new NonTerminal("Foo_4");

      this.Root = theRoot;
      theRoot.Rule = foo | tg3;
      // The "(" KeyTerm from TG3 is reused here since the
      // list of KeyTerms is now "shared".
      foo.Rule = "Foo" + ToTerm("(") + m_Bar + "," + number + ")";

      MarkPunctuation("(", ",", ")");
   }
}

The benefit is that it isn't as ugly since I don't need to share the open paren KeyTerm and that "Foo" is being passed to the correct ToTerm(...) function (because
Grammar.CurrentGrammar isn't changing).  However, it is an ugly software design since TG4 isn't really a "more specific type of" TG3.  It just USES TG3.

Also, inheritance is not possible when more than one grammar needs to be used as a "sub grammar".

For now, I will probably go with the inheritance based solution as it is the "lesser of the two uglies".  However, I strongly believe that there is a problem with the current
implementation of how Grammar.CurrentGrammar is used, especially in regards to the ToTerm(...) function.

Perhaps the list of KeyTerms should be static so that it can be globally shared?  One thing to keep in mind is case sensitivity of grammars...

Brian