Or Operator

Jan 11, 2010 at 3:43 PM


I have been having some problems when trying to use the OR (|) operator to "extend" the possible productions for a non-terminal.  I am trying to do this inside of a derived grammar's constructor in order to extend a base grammar.

To keep things (relatively) simple, suppose I had a base grammar called "BasicExpressionGrammar" as follows:

[Language("Basic Expression Grammar", "1.0", "Base grammar for simple expressions with unary and binary operators")]
public abstract class BasicExpressionGrammar : Grammar
  private NonTerminal m_Expression;
  private NonTerminal m_UnaryOperator;
  private NonTerminal m_BinaryOperator;
  private NonTerminal m_Term;

  public BasicExpressionGrammar()
    m_Expression = new NonTerminal("Expression");
    m_UnaryOperator = null;
    m_BinaryOperator = null;
    m_Term = new NonTerminal("Term");

    var parenExp = new NonTerminal("ParenthesisExpression");

    this.Root = m_Expression;
    m_Expression.Rule = m_Term;
    parenExp.Rule = "(" + m_Expression + ")";
    m_Term.Rule = parenExp;

    MarkTransient(m_Expression, m_Term, parenExp);

    RegisterPunctuation("(", ")");

    // Ignore new lines

    this.LanguageFlags =
      LanguageFlags.CreateAst |

  protected void AddUnaryOperator(params string[] ops)
    if(ops == null || ops.Length == 0) return;

    if(m_UnaryOperator == null)
      m_UnaryOperator = new NonTerminal("UnaryOperator");
      var unaryExp = new NonTerminal("UnaryExpression", typeof(UnExprNode));
      // When the user specifies ANY unary operators,
      // we need to add them to the list of productions from
      // the Expression non-terminal
      m_Expression.Rule = unaryExp | m_Expression.Rule;
      unaryExp.Rule = m_UnaryOperator + m_Term;

    foreach(var op in ops)
      if(m_UnaryOperator.Rule == null)
        m_UnaryOperator.Rule = op;
        // Here we need to keep expanding the possible unary
        // operators that can be used.  Can be expanded over
        // multiple calls to this function (from sub-grammars,
        // or even sub-sub-grammars).
        m_UnaryOperator.Rule = op | m_UnaryOperator.Rule;

  protected void AddBinaryOperator(params string[] ops)
    // See AddUnaryOperator -- basically the same.

  protected void AddExpressionTerm(params BnfTerm[] terms)
    foreach(var term in terms)
      // Here we are adding new terms that the operators
      // can act on.
      m_Term.Rule = term | m_Term.Rule;
As you can see, the base grammar doesn't provide much more than a shell for sub grammars to plug into.  The base grammar could be much more complicated than this.
A derived grammar could define what operators it wants to use, as well as what "things" can be operands.  Take a sub grammar that performs integer arithmetic:
[Language("IntegerExpressionGrammar", "1.0", "Evaluates integer expressions")]
public class IntegerExpressionGrammar : BasicExpressionGrammar
  public const NumberFlags IntegerFlags =
    NumberFlags.IntOnly |

  public IntegerExpressionGrammar()
    var constant = new NumberLiteral("Constant", IntegerFlags);
    var someFunctionParam = new NumberLiteral("SomeFunctionParam", IntegerFlags);

    // Use your imagination as to what this function will do.
    var someFunction = new NonTerminal("SomeFunction", typeof(SomeFunctionProcessingNode));

    someFunction.Rule = "Foo(" + someFunctionParam + ")";

    // If I wanted, I don't have to have any unary ops...
    // ... and/or any binary ops
    AddBinaryOperator("+", "-", "*", "/");
    AddExpressionTerm(constant, someFunction);
Here we have defined a subgrammar that performs arithmetic evaluation of integer expressions with some function called "Foo".  Foo could easily be "Sum", "Min", "Max", etc.
Some things to note here are that the base grammar's functionality needs to expand (new terms, operators, etc.) as the sub grammar needs it to.
However, I do not know in the sub grammar what all the various "Term"s could be at any given time (nor do I care!).  It just wants to be able to "add" the function "Foo" as a possible term.
There could be a sub (sub) grammar that extends IntegerExpressionGrammar, and it won't care what all is in Term either.
I have considered having something like "pre-defined" productions with a "custom" non-terminal that can get added.
// Data member declaration
protected NonTerminal m_CustomUnaryOp;

// In constructor:
var PredefinedUnaryOp = new NonTerminal("PredefinedUnOp");
m_CustomUnaryOp = new NonTerminal("CustomUnOp");

PredefinedUnaryOp.Rule = ToTerm("-");
m_CustomUnaryOp.Rule = null;
However, this has a few problems:
1) When I set the production for m_CustomUnaryOp, I would need to update PredefinedUnaryOp's rule to be:
PredefinedUnaryOp.Rule = ToTerm("-") | m_CustomUnaryOp;
and the sub classes don't have access to PredefinedUnaryOp.
2) If I gave access to PredefinedUnaryOp, they could easily replace the rule for that and ignore m_CustomUnaryOp altogether.  (Sub classes should not be able to "remove" existing productions).
3) If I just set:
PredefinedUnaryOp.Rule = ToTerm("-") | m_CustomUnaryOp
and the sub grammar never sets a rule for m_CustomUnaryOp, then there will be an error constructing either the grammar or the Ast (I forget which) since the rule is null.
4) If I provide a protected function to perform the update necessary in #1 when the user wants to add custom unary operators, there is nothing that prevents a sub sub [sub...] grammar from calling the same function and replacing the custom operator defined in the sub grammar.
I would need to keep overriding this function in every sub grammar and providing yet another custom operator non terminal for it's sub grammar to use... (Which can get messy quickly with a handful of these "extensions" needed for a grammar).
It would be far nicer for the OR (|) operator to allow the addition of extra productions in a syntax such as in AddUnaryOperator:
m_UnaryOperator.Rule |= someNewOperator;
m_UnaryOperator.Rule = m_UnaryOperator.Rule | someNewOperator;
m_UnaryOperator.Rule = someNewOperator | m_UnaryOperator.Rule;
and then call this inside a protected function so that the sub grammars cannot directly modify m_UnaryOperator's rule.
Jan 11, 2010 at 8:45 PM

I think you should try the following. Instead of using overloaded "|" and "+" operators when modifying rules, you should dig into BnfExpression (Rule property) and make careful alteration of its internal structure in code.

Look at internals of operator overloads, here's a sample. Instead of

    m_Expression.Rule = unaryExp | m_Expression.Rule;

do the following:

  var newList = new BnfTermList();



Look at BnfExpression.Data field - it is OR list of Plus-lists of BNF terms. So modify it directly in code. I think that'll do it



Jan 12, 2010 at 5:35 PM
Edited Jan 12, 2010 at 8:44 PM


First of all, thank you for your help.  That sounds like a great suggestion.  However, BnfExpression.Data is an internal property, and I cannot access it from outside of the assembly.

I rewrote the OR (|) operator as:

internal static BnfExpression Op_Pipe(BnfTerm term1, BnfTerm term2) {
  BnfExpression expr1 = term1 as BnfExpression;
  if (expr1 == null)
    expr1 = new BnfExpression(term1);
  BnfExpression expr2 = term2 as BnfExpression;
  if (expr2 == null)
    expr2 = new BnfExpression(term2);
  return expr1;

After some testing, it seems to work fine.  The ExpressionEvaluatorGrammar in the Irony.Samples builds, parses, and runs correctly even when I change:

Expr.Rule = Term | UnExpr | BinExpr | PostFixExpr;


Expr.Rule = Term;
Expr.Rule = Expr.Rule | UnExpr;
Expr.Rule = BinExpr | Expr.Rule;
Expr.Rule |= PostFixExpr;

Basically, I just changed it so that it gets both terms as expressions.  Then it "OR"s all of the "Plus" lists in both expressions together (the call to AddRange(...)).

I haven't tested it with GrammarHints (the only other kind of BnfTerm at the moment) as I have no idea what they really are to begin with.

Do you see any problems with this new OR operator implementation?



I have found one very minor side-effect of this implementation of the OR operator: The ToString() method on BnfExpression does not always display correctly.

This is due to the fact that the BnfExpression's Data is changed (added to) when it is used on the left hand side of the OR operator (as in Expr.Rule = Expr.Rule | UnExpr), and the string used for ToString() is cached.

The underlying Data is stored correctly, but the ToString() does not update to represent that.  The simplest solution is to not cache the ToString().

Jan 13, 2010 at 4:47 PM

Well, I think this is an interesting suggestion and might (!) be a good improvement. Let me think about this, and if everything works with this I'll put it into next code drop



Jan 21, 2010 at 11:49 PM


I have given this topic a little more thought, and I have a few more suggestions.

1) The solution I had given previously for the OR (|) operator does not follow normal operator guidelines; Operators should not modify the operands.
I propose a slightly modified version of the OR (|) operator:

internal static BnfExpression Op_Pipe(BnfTerm term1, BnfTerm term2) {
  var expr1 = term1 as BnfExpression;
  if (expr1 == null)
    expr1 = new BnfExpression(term1);
  var expr2 = term2 as BnfExpression;
  if (expr2 == null)
    expr2 = new BnfExpression(term2);
  var result = new BnfExpression();
  return result;

The biggest difference here is simply that we do not change the first expression, but rather use it to compute a separate result expression.
Since term1 is no longer modified (if it was a BnfExpression to start with), there is no more problem with caching the ToString() method once again.
Also, term1 can safely be reused in other rules:

BnfExpression someExpression = ToTerm("Hello") + "World";
NonTerminal bobSays = new NonTerminal("BobSays");
bobSays.Rule = someExpression | ToTerm("Goodbye");
NonTerminal susanSays = new NonTerminal("SusanSays");
susanSays.Rule = someExpression | ToTerm("Wow");

2) The AND (+) operator currently allows expressions to be formed that are not in BNF.  BNF states that you can only have "OR" lists of "PLUS" lists of terms.
However, the current implementation of the AND (+) operator allows the following expression:

BnfExpression someExpression =
  ToTerm("This") + (ToTerm("is") | "not" | "in" | "BNF")

Logically, this should not be allowed since it creates an expression that is not in BNF.  This statement also creates problems when you try to use
it as a NonTerminal's Rule with an AST-building grammar.  Those pesky extra NonTerminals start showing up that don't have an AstNode type
associated with them ("NT0", "NT1", etc.).  In this case, the extra NT "NT0" shows up as the "is|not|in|BNF" portion of the rule.
(With someExpression being created as "This+NT0").  The lack of an AstNode type causes exceptions to get thrown when constructing the AST.

To solve both of these issues, I propose a new AND (+) operator:

// Plus/sequence
internal static BnfExpression Op_Plus(BnfTerm term1, BnfTerm term2) {
  var expr1 = term1 as BnfExpression;
  if (expr1 == null)
    expr1 = new BnfExpression(term1);
  var expr2 = term2 as BnfExpression;
  if (expr2 == null)
    expr2 = new BnfExpression(term2);

  // If either one is a Pipe-type expression, it is not BNF
  if (expr1.Data.Count > 1 || expr2.Data.Count > 1)
    throw new ArgumentException("Invalid expression");

  var result = new BnfExpression();
  result.Data.Add(new BnfTermList());

  return result;

Basically, it gets both terms as an expression that should be just a single Plus-list.  If either one of them has more than one
Plus-list (A "Pipe-type" expression), then it is not in proper BNF, and an exception is thrown.  The result is an expression
that is a single Plus-list of everything that was in the Plus-list of the first expression followed by the Plus-list of the second

3) In order for these operators to work correctly, the BnfExpression's constructors just be modified slightly:

public BnfExpression(BnfTerm element) : this() {
  Data.Add(new BnfTermList());
public BnfExpression() : base(null) {
  Data = new BnfExpressionData();

The only difference is that the line "Data.Add(new BnfTermList());" got moved from the default constructor, to the initial value constructor.
The reason for this change is so that when creating an empty expression (via the default constructor), there isn't already a single, empty Plus-list.
Without this change, the new OR and AND operators would leave some empty Plus-list "artifacts" with the calls to AddRange.

For example, an expression such as:

BnfExpression someExpression =
  ToTerm("This") | "is" | "a" | "test";

would have an OR-list looking like:

[<emptyPlusList>, <emptyPlusList>, <emptyPlusList>, "This", "is", "a", "test"]


What do you think about these three changes?


Jan 23, 2010 at 7:15 AM

Ok, thanks for the suggestions!

1) - changing the Op_pipe implementation. Honestly, I don't see a big value in this change. About caching the ToString value - I'm in fact removing it, it will no longer be a problem. As for reusing "expressions" - this is quite "uncommon" way of forming expressions I think, and I don't see much value in it. What is unusual in this pattern is that you're reusing segments of expressions instead of defining a new non-terminal enclosing the repeated segment - which makes sense in most cases.

2) What you mean probably is that "+" operator forms expressions that are not in "normalized" form - set of productions. You suggest to disallow this at the level of expressions. Irony automatically normalizes such expressions by adding extra non-terminals. You say - it is bad because new nonterminals don't have AST node assigned. But not every grammar has a purpose of creating AST tree, some applications are ok with just parse tree. An example - SearchGrammar in samples, it does not need AST, it works directly with parse tree nodes. And other examples are quite viable - code colorizers, formatters, converters. So in grammars like these, prohibiting use of OR operator inside PLUS expressions would be quite limiting and annoying. Even if a grammar creates AST, still in some cases such no-AST nodes can make sense. I'm actually thinking about adding NoAst flag to term flags - it would allow mark some non-terminals that they don't need AST node, so grammar validation would not throw error.

3) - is necessary only for 1) and 2)

So for now I would prefer leave things as is. I already put in place the change to OP_Pipe you suggested previously.