Non-grammar terminals priority

Developer
Mar 17, 2010 at 2:38 PM
Edited Mar 17, 2010 at 2:39 PM

Hi, Roman!

Seems like non-grammar terminals have the highest priority in the latest code drop.

My grammar allows single-line comments starting with the star symbol (*). The same symbol is used in arithmetic expressions for multiplication operator.

In Jan-24 version of Irony I just lowered the priority for LineCommentTerminal to make things work. The trick doesn't work anymore because of this:

/* Scanner.cs, line 100 */

//Scans the source text and constructs a new token
private void ScanToken() {
    if (!MatchNonGrammarTerminals())
        MatchRegularToken();      
    //...
}

Do you have any suggestions on how to overcome this issue? I don't think it's a good idea to make custom comment terminal for such a small problem :)

Coordinator
Mar 17, 2010 at 2:45 PM

What is the ultimate rule to distinguish the star starting comment and star as arithm operator?

There are some good reasons behind this change of handling non-grammar terminals, it now allows grammars using indentation (and CodeOutlineFilter) like Python to use scanner-parser link - when scanner uses the current parser state to resolve ambiguities. 

Developer
Mar 17, 2010 at 2:56 PM

The ultimate rule is just a priority ;)

If the star can be part of an expression, then it is star terminal. If isn't, then it's a part of a comment. In pseudocode:

  1. try to match non-grammar terminals
  2. try to match regular tokens (is it where the parser state is used to resolve ambiguities?)
  3. select terminal with the highest priority
Developer
Mar 17, 2010 at 3:17 PM
Edited Mar 17, 2010 at 3:20 PM

Hmmm, making decision like this isn't safe. It seems to be language design flaw.

If I had a comment that looks like an expression, it will be compiled as a part of an expression, leading to mysterious bugs :)

So there must be some more reliable rule...

Is there any way to specify where non-grammar terminals can be applied? I can think of two options:

  1. Star comments are allowed outside methods (or outside statements, for example), but not within.
  2. Star comments are only allowed at the beginnings of lines (after newline + whitespace). This is much more restrictive.

Is it possible to implement either option using standard features of CommentTerminal? Or maybe it can be done with some custom token filter?

Coordinator
Mar 17, 2010 at 5:16 PM

You can hook into ValidateToken event of comment terminal and reject it if it does not appear in the right position. One possibility: in the event handler you can check current parser state, and see if it expects "*" as operator - these are places inside arithm expressions. If parser expects star operator, you reject Comment token and let scanner rescan it as operator

Roman

Developer
Mar 17, 2010 at 7:23 PM
Edited Mar 17, 2010 at 7:26 PM

Thanks a lot for this suggestion!

I added validation event handler, but it didn't work:

void LineComment_ValidateToken(object sender, ParsingEventArgs args)
{
  // if "*" is allowed in the current parser state, suppress comments starting with "*"
  var parserState = args.Context.CurrentParserState;
  if (parserState.ExpectedTerminals.Contains(ToTerm("*")))
  {
    // rewind input stream and reject the token
    args.Context.SetSourceLocation(args.Context.CurrentToken.Location);
    args.Context.CurrentToken = null;
  }
}

Then I found that scanner don't validate non-grammar tokens. So I added token validation and everything went ok.

Here is a tiny patch for Scanner.cs:

Index: Scanner.cs
===================================================================
--- Scanner.cs	(revision 47769)
+++ Scanner.cs	(working copy)
@@ -121,6 +121,8 @@
         return false;
       foreach(var term in terms) {
         Context.CurrentToken = term.TryMatch(Context, Context.Source);
+        if (Context.CurrentToken != null) 
+          term.InvokeValidateToken(Context);
         if (Context.CurrentToken != null) return true; 
       }
       return false;
Coordinator
Mar 17, 2010 at 7:28 PM

Good catch, will add this piece

 

Coordinator
Mar 17, 2010 at 7:32 PM
Edited Mar 17, 2010 at 7:34 PM

One more thing - you don't need to reset location when you reject the token, scanner will do it automatically

 

Developer
Mar 17, 2010 at 7:46 PM

I thought so.

But the handler didn't work until I added SetSourceLocation(…).

Coordinator
Mar 17, 2010 at 9:03 PM

Oopss.. then I'll fix that too

Coordinator
May 2, 2010 at 12:45 AM

Fixed

 

May 2, 2010 at 10:53 AM

Hi Roman, just found the project -- it's fantastic, thanks for it!

Also, wondering if I found a bug or I'm just confused. In the operator parsing code, we get to this section (pasted below) which seems to march up the stack looking to see if there's an operator of higher precedence, and if so it does a reduce. But, it will transverse delimiters like close-parentheses, close braces, etc. Shouldn't it stop when it encounters one of those?

Thanks,

Matt

 

private ParserActionType GetActionTypeForOperation(ParserAction action) {
      for (int i = Context.ParserStack.Count - 1; i >= 0; i--) {
        var  prevNode = Context.ParserStack[i];
        if (prevNode == null) continue;
        if (prevNode.Precedence == BnfTerm.NoPrecedence) continue;
        ParserActionType result;
        //if previous operator has the same precedence then use associativity
        var input = Context.CurrentParserInput;
        if (prevNode.Precedence == input.Precedence)
          result = input.Associativity == Associativity.Left ? ParserActionType.Reduce : ParserActionType.Shift;
        else
          result = prevNode.Precedence > input.Precedence  ? ParserActionType.Reduce : ParserActionType.Shift;
        return result;
      }
      //If no operators found on the stack, do simple shift
      return ParserActionType.Shift;
    }

 

Coordinator
May 3, 2010 at 5:06 PM

No, it is not a bug and I would say you're a bit confused :) I would admit, this traversal of ALL child nodes is not optimal, and can be optimized to check only certain nodes that can be operators, based on productions in the current state. But in any case, the traversal of all is correct in general. You would not see a close parenthesis in traversal - these should have been already reduced and sitting in parser stack as simply Expr node or smth like this. Just step thru in debugger in ExprEvaluator sample with test expression like "1 + (2 + 3) * 4 * 5 ". Put a breakpoint in this method and inspect the contents of Context.ParserStack and see what nodes are sitting there at each stop - you would see clearly how it works. 

Roman