AST building in last version (2012_01_23)

Feb 28, 2012 at 8:27 AM

I had implemented a language grammar (CQL, part of the OGC CSW 2.0.2 standard) for my project (GeoSIK) using the october 2011 version of Irony. Somehow noticing that a new version of Irony had been pushed to the download page, I tried to update my dependency, and here are some questions/suggestions about it.

  • I think it would be better to archive old versions of Irony (naming them alpha1, alpha2, or CTP1, CTP2...):
    • my project is a library (as opposed to an application), so I don't think it is wise for me to make the Irony assemblies part of my releases.
    • the current release process for Irony means that my project must to be up to date, so that the users of my library can have access to the right version of Irony. Old versions of my library are unusable, though.
    • it is very difficult to get notified of new versions right now...
    • what about delivering NuGet packages (I can help about this)?
  • CQL is a query language. I use Irony to parse the input, create an AST and build a LINQ expression from this. As part of this use case:
    • I love that I now can specify the AST node type in the Terminal constructor.
    • the default node types specifications (DefaultIdentifierNodeType, DefaultLiteralNodeType) having disappeared, I have had to add node types to all literals and identifiers...
    • much worse (IMHO) is the new handling of operators. To be able to define an operator usable by a BinaryOperationNode, I used to simply type the following
      OperatorMappings.Add(equals_operator.Text, ExpressionType.Equal, 10);
      Now OperatorMappings have disappeared, I have to:
      • create and store a new OperatorHandler:
        _OperatorHandler=new OperatorHandler(true);
        var oid=_OperatorHandler.BuildDefaultOperatorMappings();
        oid.Add(equals_operator.Text, ExpressionType.Equal, 10);
      • override the BuildAst method so that the AstContext is an instance of InterpreterAstContext (or else BinaryOperationNode is unusable, cf. line 42) with the operator handler defined above:
        public override void BuildAst(LanguageData language, ParseTree parseTree)
            if (!LanguageFlags.IsSet(LanguageFlags.CreateAst))
            var astContext=new InterpreterAstContext(language, _OperatorHandler);
            var astBuilder=new AstBuilder(astContext);
    • I wish there was an easier way...

Maybe there is another way, or maybe I am not using Irony properly (as you know, the documentation is scarce). Or maybe these are simply overlooks (bugs?). Please let me know of anything I might be doing wrong. If you have the courage, my set of changes is accessible here.

Do not get the wrong impression: I think your work is a truly great piece of software, and I am only complaining because I am using (and loving) it. Keep up the good work.

And let me know if you want help on the NuGet packaging.

Mar 2, 2012 at 6:28 PM


Sorry for your troubles. Let me explain a few things, or better say, present some excuses - why things the way they are.

You have started something serious, and for a period of time - yes, I think you should include Irony assembly into your distribution. One reason is that you use AST functionality. The latest change you're talking about is when I finally completed long planned goal - to separate the general AST, and interpreter infrastructure from core Irony (pure parsing). I tried to keep parsing part stable and backward compatible, while did not care so much about AST and interpreter API - just did not expect anybody was using it for serious stuff - and at low-level API. At least, I thought, people were using Evaluator with some customizations, and all API changes underneath would not affect them. Apparently I was wrong, and you are the frustrated victim. Sorry again, but this move should have been done any was, it was long overdue. The current structure (I believe) is much better.

For the future, I expect the AST/interpreter stuff still to be relatively unstable. So when you take a new version, see some major changes and see that convenient ways you used to do things disappeared, contact me - maybe the API just moved, or it maybe an overlook on my part. But keep shipping particular version of Irony with your code anyway, for now. I expect parsing part to be stable, will only add extra facilities, but cannot say the same about AST and interpreter. 

In my opinion, it is too early to move Irony distribution to NuGet, we're not at 1.0 point yet. The trouble is that I'm 100% busy with other stuff at daytime work, and other project I have (, which is much more urgent for me now - sorry folks. Irony is on-hold - sorry folks, can only occasionally answer urgent questions, but did not touch the code in weeks if not months.

Now, about things you do.  First, it's in my to-do list to provide an output as expression tree of the parse tree, as standard facility. The latest changes were done with these plans in mind. Why don't you look at this - how to add a few things inside Irony to produce expression tree output?

About default node types. They moved to AstContext - you could set it there; 

Modifying operator mappings - InterpretedLanguageGrammar has a method CreateRuntime. The idea was that this would be a customization point for things like operator mappings. You would call base method and then tweak the runtime object - change operator mappings for ex. It might be possible that some API methods are missing there - it is an overlook, the result of rush refactorings. Add the method(s) and let me know - I'll fix it. 



Mar 5, 2012 at 12:52 PM

Thanks for your answer.

And don't be too sorry: I have a much clearer overview of the project now, which is what I (and surely others) lacked the most. I perfectly understand that the AST functionality is (and will be) unstable, and now I understand why.

I still wished old versions of Irony could be downloaded from CodePlex though (albeit with disclaimers about the early stage the project is in, and about the likelihood of future breaking changes). Suppose a developer wanted to use Irony AND my library? Suppose I have not had the time to upgrade my code to the latest version of Irony: he will be able to pick compatible Irony assemblies from my project, but what about the Grammar Explorer for instance (which I found very useful)? Should I release it as well?

As for the creation of an expression tree, I have now gained a bit of experience as part of my project, which I may be able to use to try and help you. This may not be the right place to discuss these technical things, so do not hesitate to fork this discussion somewhere else if you want to. The main problem I have had to deal with was with the handling of Identifiers. The language I have implemented (CQL) is a SQL-Like language (the query part, after the WHERE clause). Identifiers are mapped to fields or properties in an in-memory, object representation of the underlying data (think entities / Linq to Entities). Suppose I have an object like this:

public class Person
    public string Name { get; set; }
    public IList<string> FirstName { get; }

  • Using an AST, simple expressions can be built very easily. WHERE NAME='foo' translates into p => p.Name=="foo". No problem:
    • I take the operator left argument: mapping from NAME to p.Name.
    • I take the right argument: the string "foo".
    • I combine the two with the operator. Return this node. Done.
  • But when lists are involved, you see that WHERE FIRSTNAME='foo' now translates into p => p.FirstName.Any<string>(s => s=="foo").
    • The whole algorithm has to be different now. Something has to detect that one of your arguments is a list and build the Any method:
      • Maybe the mapper (first step above). Then it should have access to the right argument and to the operator, as they are part of the Any method. Then steps 2 and 3 should be skipped.
      • Maybe the operator. But the expression node that is returned is not the operator in this case, it is the left argument (the list): the operator is hidden inside the Any method...
    • Now think of what happens if the left argument AND the right argument are list identifiers: 2 imbricated Any methods, the operator being implemented in the innermost one...
  • All this complexity would not occur in a language that would be more descriptive, and close to C# (ie a language that would handle lists explicitly, like C#).

The point is that there is an obvious implementation for expression building, that would not be sufficient for every language. The problem lies in keeping the interface simple while giving absolute control to developers so that they could handle difficult scenarios like the one described above. The implementation I have come up with in my library seems to be sufficient to handle my specific requirements (I am not even sure of this at this point), but I do not think it is generic enough...

Anyway, how would you consider contributions for this? Patches? A fork?

Note that I may take my time: I too have other things on my plate ;-)

Mar 6, 2012 at 4:47 PM

about your described problem with Identifiers. Man, i'm afraid you're not doing it the right way. You're bringing semantic analysis and transformations into a parsing process - at least what I understood. Parser should not care what things "are", only what they look like. It should not care if identifier refers to list or a single object - the parser tree is the same. Parser converts text into a tree. Understanding and rewriting it is a later job done over a tree. 

For example, that's how LINQ query is processed:

1. Query parsing - as an internal c# parse tree: text->tree

2. Rewriting it as a chain of method calls to Queryable methods - this chain replaces the LINQ expression; there are simple rules: tree->series chained calls

3. At execution time (app is running), when query is constructed: execution of chained calls against initial data source (ITable or IEnumerable) - the result is an expression tree over the real data source.

4. Result enumeration - the query expression tree is transformed into a real query (SQL or a chain of calls to list methods); the real query is executed, result is enumerated

Note that 1 and 2 are done at compile time, with parsing happening at step 1. Steps 3 and 4 are done at runtime. That illustrates my point - producing the expression tree is separated from parsing in "space" and "time". The main point - parser is NOT concerned what object "is"; "identifier" is the only piece of information it cares about - it creates a parse tree with "identifier" and passes the tree down the pipeline to semantic analysis and rewriting stages. 

So when I was talking about output and expression tree, I meant only only real expressions; not all constructs in programming languages can be represented as expression trees. "WHERE" is one thing that is not there. 

The best way to contribute is to make a fork



Mar 6, 2012 at 9:50 PM

Sorry for the misunderstanding.

Having been immersed in LINQ expressions for a while, I understood you wanted to output LINQ expression trees. Which is what I had attempted to do in my special case, and I tried to explain the problems I had encountered in the translation process. But rereading your previous answer, it seems that I have been the only one talking about LINQ the all time...

I agree with everything you just explained (obviously, as you seem to have a much clearer idea of the whole thing than I do). I am indeed using Irony for the parsing and the AST output. In my case, the AST is an expression tree that I then proceed to rewrite as a LINQ expression tree, given a certain context.

So if I understand well what you are trying to do, you want to be able to produce an AST by default that is an expression tree. Right? So we would need to be able to automatically detect (or infer) the meaningful nodes types (operators, identifiers...) and the transient ones.

Am I getting closer?

Mar 15, 2012 at 9:13 PM

Hi Roman,

I'm also in the process of porting the old Irony version I was using (around September 2011) to the newest.

The new AST system is much better decoupled now, that's great. Though I have discovered an annoying mismatched behavior, compared to the previous version:

In the previous version, when a terminal had "NoAstNode", the Ast construction was skipped for this node, but not for its children. Current implementation doesn't reproduce this behavior, and this is annoying as I'm heavily relying on this behavior (For example, all my list are often declared inline into the grammar, and I'm constructing AST list from the parent node, not from the list node itself).

Do you think that It would be acceptable to change AstBuilder.BuildAst method to match this behavior (in bold, a code replacement proposal)

    public virtual void BuildAst(ParseTreeNode parseNode) {
      var term = parseNode.Term;
      // NEW BEHAVIOR: if (term.Flags.IsSet(TermFlags.NoAstNode) || parseNode.AstNode != null) return; 
if (parseNode.AstNode != null) return; //children first var processChildren = !parseNode.Term.Flags.IsSet(TermFlags.AstDelayChildren) && parseNode.ChildNodes.Count > 0; if (processChildren) { var mappedChildNodes = parseNode.GetMappedChildNodes(); for (int i = 0; i < mappedChildNodes.Count; i++) BuildAst(mappedChildNodes[i]); } //create the node //We know that either NodeCreator or DefaultNodeCreator is set; VerifyAstData create the DefaultNodeCreator if (!term.Flags.IsSet(TermFlags.NoAstNode) { var config = term.AstConfig; if (config.NodeCreator != null) { config.NodeCreator(Context, parseNode); // We assume that Node creator method creates node and initializes it, so parser does not need to call // IAstNodeInit.Init() method on node object. But we do call AstNodeCreated custom event on term. } else { //Invoke the default creator compiled when we verified the data parseNode.AstNode = config.DefaultNodeCreator(); //Initialize node var iInit = parseNode.AstNode as IAstNodeInit; if (iInit != null) iInit.Init(Context, parseNode); } } //Invoke the event on term term.OnAstNodeCreated(parseNode); }//method

Do you think that it is an acceptable fix for the new ast node builder?

As I have forked Irony to merge minor changes I did to Irony (like Scanner pluggability), I will be able to send you a pull request.

Mar 16, 2012 at 2:41 AM
Edited Mar 16, 2012 at 2:45 AM

Also, I'm getting the error "AstNodeType or AstNodeCreator is not set on non-terminals: {0}. Either set Term.AstConfig.NodeType, or provide default values in AstContext."  from AstBuilder.VerifyLanguageData for Transient nodes. Previous version of Irony didn't complain about it.

What is the correct way to handle this? Are we suppose to specify a custom AstContext and DefaultNodeType or should we add transient check to AstBuilder.VerifyLanguageData (like this:)?


if (term.Flags.IsSet(TermFlags.NoAstNode) || term.Flags.IsSet(TermFlags.IsTransient)) continue;



Mar 16, 2012 at 6:58 AM

will look at it. sorry, some things got messed up in AST during refactoring