Is there a straightforward way to resolve shift/reduce conflicts with many optional elements?

Sep 28, 2012 at 7:14 AM

The grammar I am working with explicitly specifies the places where newlines are allowed. For example:

trap_statement:
    trap  new_lines_opt   type_literal_opt   new_lines_opt   statement_block

I can write this:

 

            trap_statement.Rule =
                  "trap" + new_lines_opt + (type_literal | Empty) + new_lines_opt + statement_block;

which looks a lot like the specified grammar - great. But it has a shift/reduce conflict around the newlines. I found I was able to eliminate the conflict by rewriting the rule like this:

            trap_statement.Rule =
                ("trap" + statement_block) |
                ("trap" + new_lines + statement_block) |
                ("trap" + type_literal + statement_block) |
                ("trap" + new_lines + type_literal + statement_block) |
                ("trap" + type_literal + new_lines + statement_block) |
                ("trap" + new_lines + type_literal + new_lines + statement_block)
                ;

Which is pretty ugly, and the resulting parse tree is a mess, too. 

Even worse, I really don't care about these shift/reduce conflicts, since the newlines have no semantic value - they're only in the grammar because of restrictions about where they're allowed.

Is there a shortcut, or alternative approach to this problem? So I don't have to rewrite all the rules?

I'm using MarkTransient on newline_opt - would it be appropriate to elide reports of shift/reduce errors on transient nodes?

Sep 28, 2012 at 10:07 AM

I use the PreferShiftHere() pseudo-token. Insert it in front of the terms that are reporting the conflict.

There are also ReduceHere() and ReduceIf() pseudo-tokens, but I usually rewrite the grammar to avoid reduce-reduce conflicts.

Coordinator
Sep 28, 2012 at 6:57 PM

I'm looking into this, interesting case. Give me some time

Coordinator
Sep 28, 2012 at 8:08 PM
Edited Sep 28, 2012 at 8:08 PM

Ok, I looked at pwshell grammar (here: http://www.manning.com/payette/AppCexcerpt.pdf)

The rule for statement separator is:   

<StatementSeparatorToken> = ';' | '&&' | '||' | <end-of-line>

So NewLine is a whitespace everywhere except where statement separator is expected. Our trick would be to use NewLine in grammar (this would automatically exclude it from whitespace), but to intercept and replace it with artificial "skip" token when parser does not expect statement separator. This skip token has IsNonGrammar flag set, so parser will skip it. 
So define StatementSeparator as a field, and initialize it in constructor: 

   StatementSeparator.Rule = NewLine | ";" | "&&" | "||";

Define class-scope SkipTerminal: 

  Terminal _skipTerminal = new new Terminal("Skip", TokenCategory.Outline, TermFlags.IsNonGrammar);


Hook to NewLine.ValidateToken event in grammar constructor and add the handler:

 

          NewLine.ValidateToken += NewLine_ValidateToken; 
          //....... 
        void NewLine_ValidateToken(object sender, ValidateTokenEventArgs e) {
          if (!e.Context.CurrentParserState.Actions.ContainsKey(this.StatementSeparator)) {
            var current = e.Context.CurrentToken;
            e.ReplaceToken(new Token(_skipTerminal, current.Location, current.Text, current.Value);
          }
        }

         

 

The method checks if StatementSeparator is expected; if not, it will replace NewLine with Skip token. 

Now forget about all these new_lines_opt in rules - they should be skipped automatically by parser.

Try this, I will think about how to make it easier to handle cases like this. 

Roman

Sep 28, 2012 at 8:27 PM

Thanks for looking at this, Roman. 

FYI, that's not the grammar I'm working from - that's more of a sketch of a grammar, for humans to read when PowerShell was new, but not to write a parser from. There's a lot missing, and a lot of errors.

There is an official grammar in the PowerShell Language Specification ( http://www.microsoft.com/en_us/download/details.aspx?id=9706). It's much more complete, although I'm discovering a few errors as I go. I guess I'm the first one to ever use it.

From the spec, section 2.2.4:

Unlike some popular languages, PowerShell does not consider line-terminator characters (§2.2.2) to be white space. However, a line terminator can be treated as white space by preceding it immediately by a backtick character, ` (U+0060). This is necessary when the contents of a line are complete syntactically, yet the following line contains tokens intended to be associated with the previous line. 

The "syntactically complete" bit is important. For example:

 

PS> if ($false)     # this if includes the block
>> {
>>     "hi"
>> }
>>

 

Here, the grammar for `if` requires a block to follow, so the parser will continue looking on the next line for that block. Since the open brace requires a close brace, the parser will taking input until it gets that close brace. At that point it will evaluate the entire input. Because the condition is false, the block won't run, and no output will appear. This is all one statement.

PS> Set-Item function:prompt   # this command does not include the block
PS> {
>>    "PS> "
>> }
>>

   "PS> "
In this case, the first `Set-Item` command will accept a block as a parameter, but the parser doesn't know that. Instead, the grammar says a command ends with a newline. It processes the `Set-Item` command when you hit ENTER. Then it sees the open brace and starts parsing again. When you're done, the block is executed, and produces an output. This is all 2 statements.

 

Coordinator
Sep 28, 2012 at 9:09 PM

One important note: it seems like you're trying to define several grammars, one for REPL loop, and another for parsing modules in text files. I think it's not a good approach. Define one grammar, as if it was for parsing modules. Then for REPL/command line, do parsing in 'partial' mode - Parser supports this. Look at mini-Python grammar and command line mode handling for it.

Roman 

Apr 6, 2013 at 12:14 AM
Roman,

I have been trying to implement your suggestion about hooking to NewLine.ValidateToken, and I'm not having any luck. What I'm seeing is that my ValidateToken hook is never executing.

I debugged around, and have a guess what is going wrong. See Scanner.NextToken():
    private void NextToken() {
      //1. Check if there are buffered tokens
      if(Context.BufferedTokens.Count > 0) {
        Context.CurrentToken = Context.BufferedTokens.Pop();
        return; 
      }
      //2. Skip whitespace.
      _grammar.SkipWhitespace(Context.Source);
      //3. That's the token start, calc location (line and column)
      Context.Source.Position = Context.Source.PreviewPosition;
      //4. Check for EOF
      if (Context.Source.EOF()) {
        Context.CurrentToken = new Token(_grammar.Eof, Context.Source.Location, string.Empty, _grammar.Eof.Name);;
        return; 
      }
      //5. Actually scan the source text and construct a new token
      ScanToken(); 
    }//method
Since the SkipWhitespace() comes before the ScanToken(), the NewLine.ValidateToken never gets called.

Am I reading this correctly? Does this match what you expect?
Apr 6, 2013 at 12:27 AM
Upon further debugging, I discovered:
  1. You were assuming I was not consuming \r\n in SkipWhitespace(). That's a reasonable assumption, since the NewLine token quietly sets Grammar.UsesNewLine, although I had overloaded SkipWhitespace().
  2. You write if (!e.Context.CurrentParserState.Actions.ContainsKey(this.StatementSeparator)) {, but I missed the !.
Fixing up my code now...