For Gold Parser Fans: Simple2.grm in Irony

Apr 28, 2008 at 3:31 AM
Edited Apr 28, 2008 at 3:59 AM
If you are a fan of Gold Parser Builder, and are possibly expanding your horizons into alternative methods of creating parsers, I have here a classic example to give you some perspective on what you're looking at here. This is a complete parser implementation for the Simple2.grm file translated into irony code. Talk about simple! (Note that below I have a sample visitor and console doodad for you to run it if you feel the need to test'er out.) I apologize for the formatting I can't seem to make it stick right.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Irony.Compiler;

// example grammar translation by dave dolan, the original is available from http://devincook.com

// below in comments is the original Gold-Based Simple2 Grammar
/*"Name" = 'Simple'
"Author" = 'Devin Cook'
"Version" = '2.0'
"About" = 'This is a very simple grammar designed for use in examples'

"Case Sensitive" = False
"Start Symbol" = <Statements>

{String Ch 1} = {Printable} - ''
{String Ch 2} = {Printable} - "

Id = {Letter}{AlphaNumeric}*


StringLiteral = '' {String Ch 1}* ''
| '"' {String Ch 2}* '"'


NumberLiteral = {Digit}('.'{Digit})?

<Statements> ::= <Statement> <Statements>
| <Statement>

<Statement> ::= display <Expression>
| display <Expression> read ID
| assign ID '=' <Expression>
| while <Expression> do <Statements> end
| if <Expression> then <Statements> end
| if <Expression> then <Statements> else <Statements> end

<Expression> ::= <Expression> '>' <Add Exp>
| <Expression> '<' <Add Exp>
| <Expression> '<=' <Add Exp>
| <Expression> '>=' <Add Exp>
| <Expression> '==' <Add Exp>
| <Expression> '<>' <Add Exp>
| <Add Exp>

<Add Exp> ::= <Add Exp> '+' <Mult Exp>
| <Add Exp> '-' <Mult Exp>
| <Add Exp> '&' <Mult Exp>
| <Mult Exp>

<Mult Exp> ::= <Mult Exp> '*' <Negate Exp>
| <Mult Exp> '/' <Negate Exp>
| <Negate Exp>

<Negate Exp> ::= '-' <Value>
| <Value>

<Value> ::= ID
| StringLiteral
| NumberLiteral
| '(' <Expression> ')'
*/

namespace DolanIronyTest {
class Simple2Grammar : Irony.Compiler.Grammar {

public Simple2Grammar() {

// Terminals
Terminal NumberLit = new NumberLiteral("NUMBER-LITERAL");

Terminal SingleQuotedString = new StringLiteral("SINGLE-QUOTED-STRING","'", ScanFlags.DisableEscapes);
Terminal DoubleQuotedString = new StringLiteral("DOUBLE-QUOTES-STRING", "\"", ScanFlags.DisableEscapes);

Terminal Variable = new IdentifierTerminal("VARIABLE");

// Non-Terminals
NonTerminal StringLit = new NonTerminal("STRING-LITERAL");
NonTerminal StatementList = new NonTerminal("STATEMENT-LIST");
NonTerminal Statement = new NonTerminal("STATEMENT");
NonTerminal DisplayStatement = new NonTerminal("DISPLAY-STMT");
NonTerminal InputDisplayStatement = new NonTerminal("INPUT-DISPLAY-STMT");
NonTerminal AssignmentStatement = new NonTerminal("ASSIGNMENT-STMT");
NonTerminal IfThenStatement = new NonTerminal("IF-THEN-STMT");
NonTerminal IfThenElseStatement = new NonTerminal("IF-THEN-ELSE-STMT");
NonTerminal WhileStatement = new NonTerminal("WHILE-STMT");
NonTerminal Expression = new NonTerminal("EXPRESSION");
NonTerminal BinaryOperator = new NonTerminal("BINARY-OPERATOR");
NonTerminal UnaryOperator = new NonTerminal("UNARY-OPERATOR");

// start rule, and case sensitivity
this.Root = StatementList;
this.CaseSensitive = false;


// I think there is a better way to do this, but this works for now
StringLit.Rule = SingleQuotedString | DoubleQuotedString;

Expression.Rule = NumberLit
| StringLit
| Variable
| Expression + BinaryOperator + Expression
| UnaryOperator + Expression
| "(" + Expression + ")";

BinaryOperator.Rule = Symbol("+")
| "-"
| "&"
| "*"
| "/"
| ">"
| "<"
| ">="
| "<="
| "=="
| "<>";

UnaryOperator.Rule = "-";
// this sure as heck beats gold for implementing right recursion.
StatementList.Rule = MakePlusRule(StatementList, null, Statement);

// statements can be any of the possible statements, easy enough
Statement.Rule = DisplayStatement
| InputDisplayStatement
| AssignmentStatement
| WhileStatement
| IfThenStatement
| IfThenElseStatement;

// just testing to see that it works as Symbol("display") and/or the lonely string
DisplayStatement.Rule = Symbol("display") + Expression;

InputDisplayStatement.Rule = "display" + Expression + "read" + Variable;

AssignmentStatement.Rule = "assign" + Variable + "=" + Expression;

WhileStatement.Rule = "while" + Expression + "do" + StatementList + "end";

IfThenStatement.Rule = "if" + Expression + "then" + StatementList + "end";

IfThenElseStatement.Rule = "if" + Expression + "then" + StatementList + "else" + StatementList + "end";

// operator precedence
RegisterOperators(1, ">", "<", "<=", ">=", "==", "<>");
RegisterOperators(2, "+", "-", "&");
RegisterOperators(3, "*", "/");

// trim the tree when it comes to parentheses
RegisterPunctuation("(", ")");


}

}
}


Here's a sample 'printing visitor' for traversing the tree to prove that I've done it correctly.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Irony.Compiler;

namespace DolanIronyTest {
class PrintVisitor : Irony.Compiler.IAstVisitor{
#region IAstVisitor Members

int IndentLevel = 0;

void Indent() {
IndentLevel++;

}

void UnDent() {
IndentLevel--;

}

int _NodeCount = 0;
public int NodeCount {
get {
return _NodeCount;
}
}

public void BeginVisit(Irony.Compiler.AstNode node) {

for (int x = 0; x < IndentLevel; x++) {
Console.Out.Write("--");
}

Console.Out.WriteLine(node.ToString());

Indent();

if (node.ChildNodes != null) {
foreach (AstNode kid in node.ChildNodes) {
BeginVisit(kid);
}
}

EndVisit(node);
}

public void EndVisit(Irony.Compiler.AstNode node) {
_NodeCount++;
UnDent();
}

#endregion
}
}

And of course the harness to run it by to gain that all important instant gratification.

using System;
using System.Collections.Generic;
using System.Text;
using Irony.Compiler;

namespace DolanIronyTest {
class Program {
static void Main(string[] args) {

Simple2Grammar grammar = new Simple2Grammar();
LanguageCompiler compiler = new LanguageCompiler(grammar);

string prog = "display 'give me a number for a>' read a\n" +
"display 'give me a number for b>' read b\n" +
"display 'a + b = '" +
"display (a + b)";

AstNode rootNode = compiler.Parse(prog);

PrintVisitor visitor = new PrintVisitor();

visitor.BeginVisit(rootNode);

Console.Out.WriteLine("\nTotal Nodes Touched: {0}", visitor.NodeCount);

Console.ReadKey();

}
}
}

Just in case you don't want to build and run it, here's the output of the test harness:

STATEMENT-LIST
--INPUT-DISPLAY-STMT
----display [VARIABLE]
----give me a number for a> [SINGLE-QUOTED-STRING]
----read [VARIABLE]
----a [VARIABLE]
--INPUT-DISPLAY-STMT
----display [VARIABLE]
----give me a number for b> [SINGLE-QUOTED-STRING]
----read [VARIABLE]
----b [VARIABLE]
--DISPLAY_STMT
----display [VARIABLE]
----a + b = [SINGLE-QUOTED-STRING]
--DISPLAY_STMT
----display [VARIABLE]
----EXPRESSION
------a [VARIABLE]
------+ [Symbol]
------b [VARIABLE]

Total Nodes Touched: 20


Notice how the parentheses in the "display (a + b)" statement don't show up in the tree. I'll assume that's because I've registered them as punctuation. Neat!
Coordinator
Apr 28, 2008 at 10:16 PM
Thanks for an interesting post!
To complete the comparison, here is an article describing implementation of the same grammar in Gold parser:
http://www.codeproject.com/KB/recipes/IntrotoGoldParser.aspx
I should note that article's code provides an implementation of a simple interpreter on top of Gold-generated parser.
The support for interpreter in Irony is coming - very soon!
Roman
Apr 29, 2008 at 2:22 AM
I'm anxious to see it of course, but I will point out that you're no further behind than gold is in this respect. The interpreters/post processors are not included in gold either. That example you're citing just has another dedicated citizen behind it showing how it's all tied together.

Getting from trees of productions in full recursive decent style (raw parse trees) to abstract syntax trees in a gold generated parser, even with a code template like the ones I've written, still isn't easy. In fact, I'd say the unwillingness of the population to make assumptions about similarities in certain language constructs between languages is the only reason that such a tool hasn't been built to date.

I had a go with a friend of mine, kelly parker, at a rough draft of an AST/interpreter tree generator, but we kludged a lot, and never got it to the point where it could be released. We actually used tricks like terminal naming conventions and defining key non-terminals/terminals in a config file before generation that 'helped' the AST be generated. It did work, but it was quite contrived when it did, and frankly it's all spaghetti. Kelly and I worked on separate examples at first, and well I was done last, so I mixed some of my code over top of kelly's.

The one thing about gold that made such a tool sort of feasible was that it generates a nice abstracted DFA table set. All we had to do to enumerate the terminals and whatnot was load it up with a binary reader, and dive through it, creating class templates (with code dom) from each of the non-terminals. Then the children were generated as member fields. The one concept that's hard to embody in such a tool though, and we just sort of winged it, was the inheritance. Clearly you have inheritance when you're talking about variations on the similar kind of non-terminals when each can have multiple rules, each derives from the abstract Left-hand-side of the rule. It might actually be easier to do something like this with Irony because of the way that expressions can be flattened instead of creating precedence based polymorphic hierarchies. The rest though would be pretty much the same in concept I believe.

I'm rambling, now... so I'll stop.