ParseTreeNode.ChildNodes, indexing by term name instead of index?

Oct 9, 2013 at 3:02 PM
When writing an AST mapping, one has to make many calls such as myNode.ChildNodes[3] which are not overly readable. It would be nice to be have a second index string -> ParseTreeNode implicitly mapping ParseTreeNode.Term.Name - so that the example above could become myNode.ChildNodes["Condition"].

What do you think?
Coordinator
Oct 10, 2013 at 6:03 PM
that's something I considered, but the problem is - it might be VERY confusing. Terminal names are something identifying the 'nature' of token, like identifier, while the child node you try to get is more about 'role'. Like field definition might look like:

var identifier = new IdentifierTerminal("identifier");
field.Rule = identifer + identifier + ";"

The first identifier is field's type, while the second is field name. Getting child node by name 'identifier' would not work here. You can always add an extension method that does this for you, but you would clearly understand its limitations, like the case just shown. So that's why I did not add it out of the box - might be really confusing in many cases
Roman
Developer
Oct 10, 2013 at 6:22 PM
Edited Oct 10, 2013 at 6:31 PM
I remember discussing this problem with Roman a year (or maybe two years) ago.
Using child node indexes adds tight coupling between the grammar and the code that builds AST.
Once you adjust the grammar, you have to update the AST builder, otherwise it breaks.
My proposal was to add names to child nodes (to set their "roles") like this:
IfExpr.Rule = ToTerm("if") + Expr.Named("Expression") + "Then" + Stmt.Named("IfBranch") +
    "Else" + Stmt.Named("ElseBranch");
so that you can write myNode.ChildNodes["Expression"] and so on.
But Roman disliked this approach saying it's ugly as hell :)
I admit that it looks a bit clumsy, but I still think that getting rid of hardcoded numbers would be a big improvement.

Maybe we could combine these two approaches?
If the term is unique in BNF rule, we could use it's default name (like "identifier" mentioned above).
But if the term is non-unique, we could name it explicitly:
var identifier = new IdentifierTerminal("identifier"); 
field.Rule = identifer + identifier.Named("FieldName") + ";" 
Any better ideas? :)
Coordinator
Oct 10, 2013 at 6:30 PM
Yeah, look at the expression (it doubles in length) and imagine bigger, much more complex rules in real languages.
This in turn introduces another tight coupling and with loose/weak name matching, don't you think? AST nodes are supposed to be shared between languages (c# vs VB.NET); then particular grammar (c#) should use precise role names as arguments, everywhere. And you cannot easily turn on/off using/building AST. There's a facility called child node mapping, you can specify how to remap the order of child nodes before it is submitted to AST node constructor, it is shown somewhere in samples. At least this piece (all statement specifying maps + ast node type) can be moved to a separate block and enabled/disabled with one switch
Coordinator
Oct 10, 2013 at 6:34 PM
again, I spent some time hard thinking about this; I know this is ugly, but always consider the consequences of introduction of a new 'convenience' feature - how many uses might be confused, use it inappropriately and scream for help here? this is always a big consideration.
Developer
Oct 10, 2013 at 6:39 PM
Edited Oct 10, 2013 at 6:39 PM
This in turn introduces another tight coupling and with loose/weak name matching, don't you think?
 
I completely agree with that.
But I still hope to get rid of node indices somehow :)
Oct 10, 2013 at 7:25 PM
Thank you all for your very interesting insights. There are indeed a lot of implications to consider.