Maybe I am missing something...

Feb 18, 2014 at 6:15 PM
Hello All,
I am implementing a language in Irony similar to (for now) C#. It was implemented from the C# 4.0 specification. It has issues when parsing method declarations. Specifically, it is failing to reduce to return_type. I have tested changing the return_type in method_header to type_ref and it then recognizes the declaration, but it fails for a void return type (for obvious reasons). I feel I am missing something and could use some other eyes on this. Any help would be greatly appreciated.
I have attached the complete grammar to the post in 'issues' of the same title.

// NonTerminals
type_ref, return_type, method_header, method_declaration

KeyTerm @void = ToTerm( "void" );
type_ref.Rule = value_type | reference_type | type_parameter;
return_type.Rule = type_ref | @void; method_header.Rule = member_header + return_type + member_name + type_parameter_list.Q() + open_par + formal_parameter_list.Q() + close_par + type_parameter_constraints_clauses.Q();
method_declaration.Rule = method_header + method_body;
Coordinator
Feb 19, 2014 at 6:09 AM
the first question to ask - did you test in GrammarExplorer, and did it show any grammar errors (shift/reduce conflicts)? If not, did you apply any hints or tricks to get rid of conflicts?
Feb 19, 2014 at 5:55 PM
Hey there. Thank you for the reply. I have tested it extensively in the GrammarExplorer and yes, there are many conflicts. My problem is, while I have read many documents on parsing and the specifics to LALR parsers, I fail to understand *why there is a conflict. Also, I fail to understand the proper placement of the PreferShiftHere() or ReduceHere(). The grammar I have implemented is spiritually identical to the C# 3.0 grammar you included in the samples. I used the .Q() method as opposed to the separate __*opt_ style in your example and the naming scheme follows the spec. The main difference in the method_declaration is that your example uses type_ref while mine uses return_type. The important difference (please correct me where I am wrong) is that the sample definition of type_ref includes and allowance for the "void" keyword/terminal where as this/my implementation and the spec do not (as void is only allowed for a method return type and hence, the additional non-terminal). I have tested the grammar by modifying it to use type_ref and added "void" to it. It does work, but that allows (grammatically) "void" to be placed almost anywhere and does not cause a syntax error.
The problem observed in the GrammarExplorer is that the state transitions through the reductions from integral_type->int (for example) all the way to type_ref (which is correct) but then stops and does not reduce further to return_type. I have no idea where to add a hint. The method_header is listed properly (GE output listed below).
Again, please excuse my ignorance. If you could link me to any reference material that would lead me in the right direction, I would be forever grateful.
State S934 __(Inadequate)__
  Shift-reduce conflicts on inputs: identifier
  Shift items:
    constant_declaration -> member_header ·const type_ref constant_declarators semi 
    field_declaration -> member_header ·type_ref variable_declarators semi 
    type_ref -> ·value_type 
    value_type -> ·struct_type 
    struct_type -> ·type_name 
    type_name -> ·namespace_or_type_name 
    namespace_or_type_name -> ·identifier Unnamed1 
    namespace_or_type_name -> ·namespace_or_type_name dot identifier Unnamed38 
    namespace_or_type_name -> ·qualified_alias_member 
    qualified_alias_member -> ·identifier colon_colon identifier Unnamed8 
    struct_type -> ·simple_type 
    simple_type -> ·numeric_type 
    numeric_type -> ·integral_type 
    integral_type -> ·sbyte 
    integral_type -> ·byte 
    integral_type -> ·short 
    integral_type -> ·ushort 
    integral_type -> ·int 
    integral_type -> ·uint 
    integral_type -> ·long 
    integral_type -> ·ulong 
    integral_type -> ·char 
    numeric_type -> ·floating_point_type 
    floating_point_type -> ·float 
    floating_point_type -> ·double 
    numeric_type -> ·real 
    simple_type -> ·bool 
    struct_type -> ·nullable_type 
    nullable_type -> ·non_nullable_value_type qmark 
    non_nullable_value_type -> ·type_ref 
    value_type -> ·enum_type 
    enum_type -> ·type_name 
    value_type -> ·pointer_type 
    pointer_type -> ·unmanaged_type astrisk 
    unmanaged_type -> ·integral_type 
    unmanaged_type -> ·floating_point_type 
    unmanaged_type -> ·char 
    unmanaged_type -> ·bool 
    pointer_type -> ·void astrisk 
    type_ref -> ·reference_type 
    reference_type -> ·class_type 
    class_type -> ·type_name 
    class_type -> ·object 
    class_type -> ·dynamic 
    class_type -> ·string 
    reference_type -> ·interface_type 
    interface_type -> ·type_name 
    reference_type -> ·array_type 
    array_type -> ·type_ref rank_specifiers 
    reference_type -> ·delegate_type 
    delegate_type -> ·type_name 
    type_ref -> ·type_parameter 
    type_parameter -> ·attributes identifier 
    attributes -> ·attribute_section+ 
    attribute_section+ -> ·attribute_section+ attribute_section 
    attribute_section+ -> ·attribute_section 
    attribute_section -> ·[ Unnamed2 attribute_list ] 
    constructor_declaration -> member_header ·constructor_declarator method_body 
    constructor_declarator -> ·identifier ( Unnamed55 ) Unnamed56 
    property_declaration -> member_header ·type_ref member_name { accessor_declarations } 
    extension_property_declaration -> member_header ·type_ref member_name [ this type_ref identifier ] { accessor_declarations } 
    event_declaration -> member_header ·event type_ref variable_declarators semi 
    event_declaration -> member_header ·event type_ref member_name { event_accessor_declarations } 
    indexer_declaration -> member_header ·indexer_declarator { accessor_declarations } 
    indexer_declarator -> ·type_ref this [ formal_parameter_list ] 
    __method_header -> member_header ·return_type member_name Unnamed43 ( Unnamed44 ) Unnamed47__ 
    return_type -> ·type_ref 
    return_type -> ·void 
    class_declaration -> member_header ·class identifier Unnamed50 Unnamed51 Unnamed53 class_body Unnamed58 
    struct_declaration -> member_header ·struct identifier Unnamed59 Unnamed60 Unnamed61 struct_body Unnamed62 
    interface_declaration -> member_header ·interface identifier Unnamed63 Unnamed65 Unnamed66 interface_body Unnamed74 
    enum_declaration -> member_header ·enum identifier Unnamed75 enum_body Unnamed77 
    delegate_declaration -> member_header ·delegate return_type identifier Unnamed78 ( Unnamed79 ) Unnamed80 semi 
Coordinator
Feb 20, 2014 at 6:21 AM
Edited Feb 20, 2014 at 10:55 AM
Ok, I see where you are in this. Here's the main points
  1. The main rule: do NOT start parsing until you resolve all conflicts in grammar explorer. Parsing errors you see - failing to parse seemingly correct input - is the direct result of these conflicts.
  2. You say " I fail to understand why there is a conflict. ". See below. But note, that placing 'PreferShiftHere' hints is not a solution, unless you know for sure what you're doing. These particular hints, non-conditional hints, are rarely usable, only when there's a clear language rule that dictates how to act in case of syntactic ambiguity. One such case is 'dangling else' (google it), and there's a rule that 'else' is always for the closest preceding 'if', so PreferShiftHere is a clear implementation of this 'non-syntax' rule. In cases like you describe, usually you need an extra work, the proper action is NOT always the same and depends on symbols down the stream or some extra info. Another thing BTW - do not use .Q() method, it is effectively deprecated, it is a useful notation in BNF, but causes a lot of problems in Irony grammar, so it's better not use it.
Here's a quick 101 on grammar conflicts. LALR(1) parser is a strict machine operating thru clear rules. Given a 'state' and an input (terminal), decide on reduce or shift action, perform an action, and move to the next state. The key here is 'decide', and ideally, for a given state and input there must always be a single action. Sometimes, there's more than 1 action, the parser is on a road fork, and has no clear choice between alternatives. These situations are detected before parsing anything, during parser state machine construction, and reported as conflicts. All conflicts you see on Grammar Errors page are such situations. Parser builder reports the conflict, but chooses some 'default' action, so it will be able to start parsing - just to give you a chance to fix it later, but it's decisions are not garanteed to be correct. That's what you see as result - wrong parser decisions.
Now the conflict itself - as far as I can see, it boils down to the following simplified situation. Suppose the parser read the following string (dot indicates the current position):

public SomeType . Foo

Right after the parser 'swallowed' SomeType, it must decide on reduce or shift action, with 'Foo' as an input. According to your grammar, it is type_ref; after reducing to type_ref, we're in the same place, but now we have a decision to make: is type_ref actually a return_type (so we parsing method), or it should be left as is (we are parsing field or property). It might be one of the following:

public SomeType Foo;
public SomeType Foo() { }
public SomeType Foo {get;}


It is easy to see why parser is in trouble here - no way to know, given the information (remember, it is LALR(1), so parser sees only 'Foo', and nothing else). And as you see, the conflict involves not only method_header (as you probably assumed in your first post), but also field and property headers.
There are 2 ways to deal with situation (at least, maybe more). BTW, ReduceHere or PreferShiftHere, fixed hints are no way to go - they will suppress the conflict but suggest fixed and wrong decision algorithm for parser.
First, you can make a quick run ahead and see what follows first - semicolon, parenthesis or curly brace. This is what is Grammar.ReduceIf method is doing - it adds a hint with custom action that runs ahead and finds the first character out of given set. c# sample grammar was put together before this hint was there, so it does not use it. Now, keep in mind that it might be not as simple as it seems - you might have more stuff jammed up right after Foo (generic arguments, comments, even conditionals or attributes?) which might make this lookup more complicated; so you have to use careful analysis of all possibilities when you use it. Essentially, it is a quick hack, workaround based on kind of euristic rule.
The other possibility is to include 'void' into type_ref, and let parser parse it all. To catch the wrong use of 'void' in property or field you can hook to Reduced event of method_header, property_header, etc, and in event handler run through parsed elements, detect that void was used on property or field, and add parse error - all tools are available there through event args. Apparently this check is missing from c# grammar. Keep in mind, c# grammar is a simplified SAMPLE, incomplete and not a real thing, just a demo to show how to build some basic stuff.
I hope this clears the trouble you have, and shows some way you can proceed. Good luck.
Sorry, all these troubles are not in general Irony's faults, it is foundational limitations of LALR algorithm - you would have the same trouble with other LALR-based tools. Although admittedly Irony could provide more built-in tricks to deal with stuff like this, one of them is GLR parsing - I wish I had more time to fix all this...
Good luck
Roman
Feb 20, 2014 at 7:50 PM
Hey Roman,
Thank you for all your efforts. You have helped me tremendously. With the information you have given here, I am sure I will be successful with this project. I will make sure to document heavily and post the results for your other users.