Newbie context-free grammar question

Jul 9, 2014 at 10:36 PM
I suspect if I understood context-free grammar better, I could answer my own question. Can two production rules use the same starting terminal? Here's an example:
    public ParserSandbox1Grammar()
      : base(true)
    {
      // 1. Terminals

      // 2. Non-terminals
      var function = new NonTerminal("Function");
      var functions = new NonTerminal("Functions");
      var global = new NonTerminal("Global");
      var globals = new NonTerminal("Globals");
      var globalsBlock = new NonTerminal("GlobalsBlock");
      var symFile = new NonTerminal("SymFile");

      // 3. BNF rules
      function.Rule = ToTerm("foo") + "baz";
      functions.Rule = MakeStarRule(functions, function);
      global.Rule = ToTerm("foo") + "bar";
      globals.Rule = MakePlusRule(globals, global);
      globalsBlock.Rule = "(no globals)" | globals;
      symFile.Rule = globalsBlock + functions;
      Root = symFile;       // Set grammar root

      //automatically add NewLine before EOF so that our BNF rules work correctly when there's no final line break in source
      this.LanguageFlags = LanguageFlags.CreateAst | LanguageFlags.NewLineBeforeEOF;
    }
Here's the input:
foo bar
foo baz
...where "foo bar" is a "global" and "foo baz" is a "function". Irony tries to interpret "foo baz" as a global, not a function, even though 0 or more functions may follow globals in my example above.
Jul 10, 2014 at 1:03 AM
OK. Here's a little more complex example of the same situation:

Consider the following input file:
source foo
a global
static b global
static s function

source bar
a global
There are 1 or more "source" clauses. Each of these clauses has a 1 or more "global"s and 0 or more "function"s. A global or function may start with the "static" keyword. I've parsed this input file with Antlr. Here's the Antlr grammar:
grammar Hello;
SOURCE   : 'source';
GLOBAL   : 'global';
STATIC   : 'static';
FUNCTION : 'function';
function : STATIC? ID FUNCTION;
global   : STATIC? ID GLOBAL;
source   : SOURCE ID global+ function*;
file     : source+;
ID : [a-z]+ ;             // match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
The corresponding Irony grammar in C# is as follows:
  [Language("ParserSandbox1Grammar", "1.0", "ParserSandbox1 Grammar")]
  public class ParserSandbox1Grammar : Grammar
  {
    public ParserSandbox1Grammar()
      : base(true)
    {
      // 1. Terminals
      KeyTerm staticTerm = ToTerm("static", "staticTerm");
      NonTerminal staticTermOpt = new NonTerminal("staticTermOpt", Empty | staticTerm);
      var identifier = new IdentifierTerminal("Identifier");


      // 2. Non-terminals
      var global = new NonTerminal("Global");
      var function = new NonTerminal("Function");
      var source = new NonTerminal("Source");
      var file = new NonTerminal("SymFile");


      // 3. BNF rules
      function.Rule = staticTermOpt + identifier + "function";
      global.Rule = staticTermOpt + identifier + "global";
      source.Rule = "source" + identifier + MakePlusRule(source, global) + MakeStarRule(source, function);
      file.Rule = MakePlusRule(file, source);
      Root = file;       // Set grammar root

      //automatically add NewLine before EOF so that our BNF rules work correctly when there's no final line break in source
      LanguageFlags = LanguageFlags.CreateAst | LanguageFlags.NewLineBeforeEOF;
    }
  }
Antlr parses the input file just fine, however Irony throws a syntax error here:
source foo
a global
static b global
         ^

"Syntax error, expected: function"
Irony also gives 4 conflicts. I can provide those if you'd like. Why is Irony insisting that this line be a "function" line? The "source" rule allows multiple globals.
Coordinator
Jul 10, 2014 at 1:41 AM
Context-free grammar understanding is not enough - you have to know smth about parsing methods like LR, LALR(1), LL, etc. Irony is LALR(1), while Antlr is LL. Grammar rules should be fine-tuned for a specific method. Irony 'is insisting' on something 'wrong' means that it took one of two equally possible alternatives resulting from ambiguities (conflicts!) that it reports. So no point trying to parse smth before you fix the conflicts. To do this - read more about LALR grammars.