What is GrammarExplorer telling me about this shift-reduce conflict?

Mar 20, 2013 at 2:53 AM
Edited Mar 21, 2013 at 3:03 AM
There's so much text here, and I'm overwhelmed. What does this say?
State S79 (Inadequate)
  Shift-reduce conflicts on inputs: [ , + dash ++ dashdash
  Shift items:
    cast_expression -> type_literal ·unary_expression 
    unary_expression -> ·primary_expression 
    primary_expression -> ·value 
    value -> ·parenthesized_expression 
    parenthesized_expression -> ·( pipeline ) 
    value -> ·sub_expression 
    sub_expression -> ·$( statement_list_opt ) 
    value -> ·array_expression 
    array_expression -> ·@( statement_list_opt ) 
    value -> ·script_block_expression 
    script_block_expression -> ·{ script_block } 
    value -> ·hash_literal_expression 
    hash_literal_expression -> ·@{ hash_literal_body_opt } 
    value -> ·literal 
    literal -> ·integer_literal 
    integer_literal -> ·decimal_integer_literal 
    integer_literal -> ·hexadecimal_integer_literal 
    literal -> ·real_literal 
    literal -> ·string_literal 
    string_literal -> ·expandable_string_literal 
    string_literal -> ·verbatim_string_literal 
    value -> ·type_literal 
    type_literal -> ·[ type_spec ] 
    value -> ·variable 
    primary_expression -> ·member_access 
    member_access -> ·primary_expression Unnamed6 member_name 
    primary_expression -> ·element_access 
    element_access -> ·primary_expression [ expression ] 
    primary_expression -> ·post_increment_expression 
    post_increment_expression -> ·primary_expression ++ 
    primary_expression -> ·post_decrement_expression 
    post_decrement_expression -> ·primary_expression dashdash 
    unary_expression -> ·expression_with_unary_operator 
    expression_with_unary_operator -> ·, unary_expression 
    expression_with_unary_operator -> ·-not unary_expression 
    expression_with_unary_operator -> ·! unary_expression 
    expression_with_unary_operator -> ·-bnot unary_expression 
    expression_with_unary_operator -> ·+ unary_expression 
    expression_with_unary_operator -> ·dash unary_expression 
    expression_with_unary_operator -> ·pre_increment_expression 
    pre_increment_expression -> ·++ unary_expression 
    expression_with_unary_operator -> ·pre_decrement_expression 
    pre_decrement_expression -> ·dashdash unary_expression 
    expression_with_unary_operator -> ·cast_expression 
    cast_expression -> ·type_literal unary_expression 
    expression_with_unary_operator -> ·-split unary_expression 
    expression_with_unary_operator -> ·-join unary_expression 
  Reduce items:
    value -> type_literal · [assignment_operator . :: [ ++ dashdash , .. format_operator * / % + dash as ccontains ceq cge cgt cle clike clt cmatch cne cnotcontains cnotlike cnotmatch contains creplace csplit eq ge gt icontains ieq ige igt ile ilike ilt imatch ine inotcontains inotlike inotmatch ireplace is isnot isplit join le like lt match ne notcontains notlike notmatch replace split -band -bor -bxor -and -or -xor 2>&1 1>&2 file_redirection_operator | ; new_line_character EOF } ) = ]]
  Transitions: unary_expression->S187, primary_expression->S158, value->S60, parenthesized_expression->S61, (->S62, sub_expression->S63, $(->S64, array_expression->S65, @(->S66, script_block_expression->S67, {->S68, hash_literal_expression->S69, @{->S70, literal->S71, integer_literal->S72, decimal_integer_literal->S73, hexadecimal_integer_literal->S74, real_literal->S75, string_literal->S76, expandable_string_literal->S77, verbatim_string_literal->S78, type_literal->S79, [->S80, variable->S81, member_access->S82, element_access->S83, post_increment_expression->S84, post_decrement_expression->S85, expression_with_unary_operator->S97, ,->S98, -not->S99, !->S100, -bnot->S101, +->S102, dash->S103, pre_increment_expression->S104, ++->S105, pre_decrement_expression->S106, dashdash->S107, cast_expression->S108, -split->S109, -join->S110
The grammar is from the PowerShell language spec: http://www.microsoft.com/en-us/download/details.aspx?id=9706.

My project is here: https://github.com/Pash-Project/Pash/tree/method-invocation/Source/Pash.System.Management
Mar 21, 2013 at 5:19 AM
Well, all I can say is just reword the conflict message: Parser does not know what to do when it pushed 'value' and sees symbols '[ , + dash ++ dashdash' in the input. I tried to look at your grammar in sources, but gave up, it's too big.
It seems to me in this particular case, the conflict is between the very first shift item and reduce item:

Shift item:
cast_expression -> type_literal *unary_expression
Reduce item:
value -> type_literal *

Parser cannot decide which action to choose when it pushed type_literal and sees things like dash in the input stream. I guess it might be part of artihm expression like 'x-y' or cast expr like 'INT -3' (not sure what cast syntax is in PS).
the only advice I can give is to try to reduce number of all these nonterminals that refer to different subtypes of values and literals. Like grammars often define 'TypeName' as a special term, and use it in rules; if you try to translate it into Irony and make TypeName nonterminal with rule = Identifier; then it causes a lot of conflicts. Because for the parser type names and variable names are indistinguishable, and should be all identifiers. So TypeName should not be introduced in Irony grammar. Same situation might be here - what is type_literal, seems like it should be simply identifier.

that's all I can guess

Mar 22, 2013 at 10:25 AM
Thanks Roman.

In PowerShell, a type literal looks like [int]. You could convert an int to a string with a cast expression: [int] "7". Types can be values, so you can do things like $t = [int].

So perhaps [int] -3 can parse as either "subtract 3 from [int]" or "cast -3 to [int]", and that's the ambiguity in the grammar. Is that what the error is saying?
Mar 22, 2013 at 4:28 PM
Answering your question - well, probably so, that's my best guess
Mar 29, 2013 at 9:17 PM
So to make things more interesting (and confusing for me):

A type literal in PowerShell looks like [int].

As the name suggests, you can use a type literal as a value:
    $x = [int]              # OK
You can also cast any value with a type literal:
    $x = [double]7          # $x is of type `double` now.
    $x = [string][int]7     # casts 7 to int first, then to string
    $x = [string][int]      # casts the Type object to string. This is legal in the grammar.
Member access is done with .. For example:
    [int].Assembly          # OK
    $x = "Assembly"
    [int].$x                # same thing
Since type literals are values, it's even legal to do:
    [int].[string]          # OK!
I'm not surprised that Irony is finding the grammar ambiguous, but I'm not sure what to do about it.
Mar 29, 2013 at 10:54 PM
do not distinguish between type literal and value/variable - even if it's done in formal grammar for the language - good for clarity of explaining syntax, but bad for LALR parser; so skip it in Irony grammar. Simply put, do not distinguish (as different non-terminals) things that are too similar syntax-wise. Do the semantic checks after parsing, over parse tree
Mar 29, 2013 at 11:49 PM
Once again, it is clearly to me that I am working at the edge of my ability. I'm not really sure what you're suggesting.

Here is an abridged excerpt from my grammar:
primary_expression.Rule = value | _member_access_or_invocation_expression | element_access | post_increment_expression | post_decrement_expression;

unary_expression.Rule = primary_expression | expression_with_unary_operator;

expression_with_unary_operator.Rule = (pre_increment_expression) | (pre_decrement_expression) | (cast_expression);

cast_expression.Rule = type_literal + unary_expression;

value.Rule = parenthesized_expression | sub_expression | array_expression | script_block_expression | hash_literal_expression | literal | type_literal | variable;
Which part do you suggest I modify? How?
Mar 30, 2013 at 1:52 AM
just looking over your grammar on github. Quick question:

_type_spec_array.Rule = array_type_name + MakeStarRule(_type_spec_array, ToTerm(",")) + "]";

_type_spec_generic.Rule = generic_type_name + generic_type_arguments + "]";

are you missing "[" somewhere?
Mar 30, 2013 at 1:54 AM
That's taken straight from the language spec, and appears correct, if you consider:
    type_name   [

    type_name   [
Mar 30, 2013 at 2:30 AM
Edited Apr 2, 2013 at 3:18 AM
I have resolved a couple more shift-reduce errors, and now I'm down to two, in this experimental branch: https://github.com/JayBazuzi/Pash/tree/shift-reduce-errors

EDIT: Those changes have been merged in to master
Apr 2, 2013 at 3:24 AM
I looked at little more closely at the type_literal shift/reduce error.
  • I found that if I remove cast_expression from expression_with_unary_operator, the shift/reduce error goes away.
  • If I remove type_literal from value, the shift/reduce error goes away.
I think the grammar would be OK with more lookahead, but Irony is LR(1). (right?)
Apr 2, 2013 at 5:30 AM
well, you might be fixing conflict, but in fact just delaying/hiding the problem - it will show up when you start parsing
on previous post, about array_type_name and generic_type_name. that's exactly the problem I'm trying to point out. Copying the official grammar definitions might be real trouble. I see that you implement these type-names using regexTerminal. First problem: I'm pretty sure that 'official' grammar definition assumes that spaces are allowed before the "[". Does your regex allow this?! Another thing. By merging the "[" into the name itself (making it part of the terminal), you effectively hiding it from the parser; the symbol is essentially a special symbol, paired brace in fact, but parser does not know about it at all. So you might lose some ability of the parser to make decisions based on this symbol. And you cannot use brace-pairing check.
So my advice - remove "[" from the terminal, here and in all similar cases. As for the rest of the name, the spec says about Unicode char categories - c# has the same stuff, so look at TerminalFactory.CreateCSharpIdentifier for an example. Try to leave a single identifier terminal in grammar used for all kind of names. Having more than one is a source of big trouble - when you start parsing, the scanner will randomly pickup one of your several identifiers, use it to produce token and that's it - it has no way to know which one is better, they all 'look' the same.
Apr 2, 2013 at 8:58 PM
Thanks Roman.

Regarding spaces before the [: no, the PowerShell grammar does not allow that for generics and array type names. For example:
> New-Object MyCollection[string]                             # same as "new MyCollection<string>()" in C#
> New-Object MyCollection [string]                            # same as "new MyCollection(typeof(string))" in C#
They are semantically different, although I think you're saying that the scanner should see them as being the same, and the difference should come when I analyze the parse tree later on. Right?

There are other cases where whitespace is not allowed:
> [int]::MaxValue     # OK

> [int]:: MaxValue    # ERROR in PowerShell v1 and v2, OK in v3

> [int] ::MaxValue    # ERROR
> [int]::Parse( '7' )     # OK

> [int]::Parse ( '7' )    # ERROR
But at least in these cases, they are an error, not an alternate semantic interpretation.

I think I understand your guidance to make all identifiers in to the same type of token. I'll try that out.
Apr 2, 2013 at 9:11 PM
Edited Apr 2, 2013 at 9:11 PM
By the way, I want to comment on my use of RegexTerminal. I used that because it's easy to translate the token definitions in the language spec to regex. Eventually I plan to migrate away from them, which I expect will improve performance and simplify my code, but that's not the top priority at the moment.

Correct behavior comes before performance.