Building ASTs and using a symbol table

Jun 9, 2010 at 1:43 PM
Edited Jun 9, 2010 at 1:47 PM


I've got a working Irony grammar, but I'm having trouble building my AST. To build it, I need a symbol table to resolve names. I think I'm probably just trying the wrong approach, and hope someone can suggest the right way to do things.

The grammar itself is trivial; it's a SQL-like template language like this;

    SELECT <columns tbl.Id tbl.Key> from <table tbl>

It was wonderfully easy to construct a parse tree that i'm very happy with;

    (LITERAL (columnList (columnRef ID ID) (columnRef ID ID))

    LITERAL (tableref ID))

So far, so good. Now I want to build my AST. When I reach a 'columnRef' term like this;

    <column tbl.col>

I want to add symbols for the 'tbl' table and the 'col' column to a global symbol table. I don't see the right way to attach the symbol table to the parser before I begin parsing.

I've tried this;

    var grammar = new MyGrammar();
    var parser = new Parser(grammar);
    // attach a new symbol table
    var symbolTable = new MySymbolTable();
    parser.Context.Values.Add("symbol-table", symbolTable);
    // now parse

And then try to look in the context for the symbol table on AstNode.Init():

    // overridden Init() on ColumnRefNode
    public override void Init(ParsingContext context, ParseTreeNode treeNode)
        base.Init(context, treeNode);

        // get the table and column names
        // from the ID terminals
        var tableName = treeNode.ChildNodes[0].Token.Value.ToString();
        var columnName = treeNode.ChildNodes[1].Token.Value.ToString();

        // get the symbol table from the context
        var st = (SymbolTable)context.Values["symbol-table"];
        var table = st.ResolveTable(tableName);
        var column = table.ResolveColumn(columnName);

But when the parser starts it clears the Values collection, and the symbol table is lost.

Can anyone tell me how they go about creating and attaching symbol tables, scope trees, and that sort of thing to the parser? And can you do it so that the grammar explorer continues to work?

Many thanks,

   Steve Cooper


Jun 9, 2010 at 2:40 PM
Edited Jun 9, 2010 at 2:48 PM


The best approach I've found is to create your own classes derived from AstNode, and then reference them selectively in your grammar constructor when you create non-terminals.  Irony will automatically instantiate your class and will call the virtual "Init" method, which you can then override in your class.

For example, I have a grammar that implements a simple scripting language.  The non-terminals look something like the following:

var Script = new NonTerminal("Script", typeof(Script));
var Statements = new NonTerminal("Statements");
var Statement = new NonTerminal("Statement");
var ActivitiesList = new NonTerminal("ActivitiesList", typeof(ActivitiesList));

and so on.  Note that you only need to specify a class for the non-terminals you want converted to AST nodes.  Then, in your derived AstNode.Init method, you have to locate the appropriate child nodes and add them as child nodes to the node that Irony creates for you.

The trick (there's always a trick) is that Irony "unfolds" the parse tree from the inside out.  So by the time the Init method is called, the children of the ParseTreeNode instance should have their AstNode members already initialized.  Thus, it's kind of an "all or nothing" proposition.  In other words, you have to know in advance how you want your AST to look, so you can go through your non-terminals and specify the classes properly.  Once you've done that, building the AST is a snap.

I've been thinking to write this up and add it to Roman's new WikiBook, because lots of people seem to struggle with AST building.

One more thing - I've found that the "top-level" node typically requires some kind of "fixup".  The best tool for this is the AcceptVisitor method that lets you easily traverse the AST to locate items that need to be relocated, etc. before additional processing.  This also happens in the Init method.  For example, typically the parent references are incorrect (again, because the AST nodes are created from the "inside out").  To deal with this, I use the following code in the top-level AST node.

public class Script : AstNode
   public override void Init(ParsingContext context, ParseTreeNode treeNode)
      base.Init(context, treeNode);
      AcceptVisitor(new FixupParentReferences());
      //...more cool stuff here...

   internal class FixupParentReferences : IAstVisitor
      public void BeginVisit(AstNode node)
         foreach (AstNode child in node.ChildNodes)
      public void EndVisit(AstNode node) {}

Hope this helps,


Jun 9, 2010 at 8:32 PM
About clearing context.Values collection - this is a valid point, there must be some way to pass some shared information down to each node, and I think this collection was created for this - and also to allow exchanging data between nodes. I will look into this, probably this collection should not be cleared automatically. For now you can just make modifications yourself in Irony sources.
Jun 10, 2010 at 9:00 AM


Thanks -- this helps clear things up. Here's how I understand it;

The post-order traversal of the nodes makes sense when building a typed tree, but it's probably not the place to do much work.

In many languages, for instance, variables need to be defined before they are used. Eg;


int i;
if (cond)
  i = 0;


We'd get an AST like this;


  ((define 'i' 'int')
   (if ('cond')
    (assign 'i' 0)))

So if I'm initializing my 'assign' node before my 'define' node, then I can't do symbol table lookups yet; the 'i' in the assign node hasn't been defined yet.

It seems to me that I am probably trying to do too much work when building the AST. What I should do is simply build the tree with the right types, and store off important token strings or child nodes. Later, I'll walk the tree and try to do something sensible with the info stored there. In the example above, then, all I'd do during AST construction is save the string 'i' in an instance of AssignAstNode, and save the 'i' and 'int' string in a DefineAstNode. Later, I'll visit all the nodes in pre-order, which will expose the nodes in the right order for symbol lookup.

Does that sound like the right approach?


I think it is right to clear the context -- it stops things from a previous parse affecting the next parse -- but there has to be a step after it has been cleared when the grammar author can act. One option might be to provide some virtuals for lifecycle events, like OnParseStart(context), OnParseFinish(context, parseRoot), and OnAstCreated(context, astRoot). When the parse starts, the grammar can initialise the symbol table there. I haven't really thought it through, but I thought I'd suggest it.