Simple grammar with shift reduce problem

Jan 12, 2013 at 12:14 AM
Edited Jan 12, 2013 at 12:28 AM

Hi,

I'm trying to create a grammar for the following syntax:

 

 

@some simple text $SomeExpression other text again until newline

$SomeOtherExpression

$YetSomeOtherExpression

@and again some simple text $WithAnotherExpression

@etc

 

 

The logic behind this is:

 * Each line contains a set with expressions - either a "TextExpression" or a "DollarExpression"

 * The expressions within a line are treated like this:

 * TextExpression - continues until the "$" character, which is a separator

 * DollarExpression - is initiated with the "$" character, followed by an identifier text

 * The "DollarExpression" can be on a single line

  A line can thus consist of multiple expressions if started with the "@" character, or with a single "DollarExpression"

 

So, I thought that the following "pseudogrammar" would work for this:

 

 

Program = list(LineExpression)

LineExpression = DollarLineExpression | AtLineExpression

AtLineExpression = @ + list(AtExpression)

AtExpression = TextExpression | $ + DollarExpression

TextExpression = "Text including whitespace"

DollarLineExpression = list(DollarExpression)

DollarExpression = IdentiferExpression

IdentifierExpression = Identifier

 

 

I implemented this with Irony with the following grammar (you should be able to copy paste this to an empty irony project of yours):

 

 

using System;
using Irony.Interpreter;
using Irony.Parsing;

namespace SlickScript.Language
{

    [Language("my grammar", "0.1", "my grammar")]
    public class MyGrammar : InterpretedLanguageGrammar
    {
        public MyGrammar()
            : base(true)
        {
            GrammarComments = "my grammar.";

            #region terminals

            //Number
            var NumberLiteral = new NumberLiteral("Number");
            NumberLiteral.DefaultIntTypes = new TypeCode[] { TypeCode.Int32 };
            NumberLiteral.Options |= NumberOptions.IntOnly;

            //String
            var StringLiteral = new StringLiteral("String", "\"", StringOptions.AllowsAllEscapes);
            
            //Keywords
            var TrueKeyword = ToTerm("true", "true");
            var FalseKeyword = ToTerm("false", "false");

            //Line comment
            var LineComment = new CommentTerminal("LineComment", "//", "\n", "\r");
            NonGrammarTerminals.Add(LineComment);

            //Block comment
            var BlockComment = new CommentTerminal("BlockComment", "/*", "*/");
            NonGrammarTerminals.Add(BlockComment);

            #endregion


            //Simple terminals definitions
            //var Identifier = new RegexTerminal("Identifier", @"[a-zA-Z_][a-zA-Z_0-9]*", typeof(Identifier), "");
            //Identifier.AstConfig.NodeType = typeof(Identifier);
            var Identifier = new RegexBasedTerminal("Identifier", @"[a-zA-Z_][a-zA-Z_0-9]*");
            var SimpleValue = new NonTerminal("SimpleValue");
            var StringValue = new NonTerminal("StringValue");
            var NumberValue = new NonTerminal("NumberValue");
            var BoolValue = new NonTerminal("BoolValue");
            var Text = new RegexBasedTerminal("Text", @"[^@$}\n]");

            //Non terminals definitions
            var Program = new NonTerminal("Program");
            var LineExpression = new NonTerminal("LineExpression");
            var LineExpressions = new NonTerminal("LineExpressions");
            var DollarLineExpression = new NonTerminal("DollarLineExpression");
            var AtLineExpression = new NonTerminal("AtLineExpression");
            var AtExpression = new NonTerminal("AtExpression");
            var AtExpressions = new NonTerminal("AtExpressions");
            var TextExpression = new NonTerminal("TextExpression");
            var DollarExpression = new NonTerminal("DollarExpression");
            var DollarExpressions = new NonTerminal("DollarExpressions");
            var IdentifierExpression = new NonTerminal("IdentifierExpression");

            //Rules

            //Simple
            StringValue.Rule = StringLiteral;

            NumberValue.Rule = NumberLiteral;

            BoolValue.Rule = TrueKeyword | FalseKeyword;

            SimpleValue.Rule = StringValue | NumberValue | BoolValue;

            //Advanced
            Program.Rule = LineExpressions;

            LineExpressions.Rule = MakeStarRule(LineExpressions, LineExpression);

            LineExpression.Rule = DollarLineExpression | AtLineExpression;

            AtLineExpression.Rule = "@" + AtExpressions;

            AtExpressions.Rule = MakeStarRule(AtExpressions, AtExpression);

            AtExpression.Rule = TextExpression | DollarExpression;

            TextExpression.Rule = Text;

            DollarLineExpression.Rule = DollarExpressions;

            DollarExpressions.Rule = MakeStarRule(DollarExpressions, DollarExpression);

            DollarExpression.Rule = "$" + IdentifierExpression;

            IdentifierExpression.Rule = Identifier;

            Root = Program;
            LanguageFlags = LanguageFlags.CreateAst;
        }
    }
}

 

However the grammar doesn't even seem to compile in the GrammarExplorer - i'm getting the following errors:

 

Shift-reduce conflict. State S0, lookaheads [$ @]. Selected shift as preferred action.
Reduce-reduce conflict. State S0, lookaheads: EOF. Selected reduce on first production in conflict set.
Shift-reduce conflict. State S3, lookaheads [$ @]. Selected shift as preferred action.
Reduce-reduce conflict. State S3, lookaheads: EOF. Selected reduce on first production in conflict set.
Shift-reduce conflict. State S7, lookaheads [$]. Selected shift as preferred action.
Shift-reduce conflict. State S11, lookaheads [$]. Selected shift as preferred action.
Shift-reduce conflict. State S18, lookaheads [$]. Selected shift as preferred action.

 

I don't really know exactly why I'm getting this (I do understand it has something to do with the lookahead characters which cannot be deduced from my grammar?)

 

Could someone help me out? - would be really appreciated! :)

What do I need to correct in order to get this working? Do I need to restructure the grammar somehow? (factoring of some sort?)

 

Thank you,

Btw, fantastic toolkit :)

 

Jan 12, 2013 at 1:01 AM
Edited Jan 12, 2013 at 1:02 AM

Short update:

 

Ok, the issue seems to be due to the recursive rules?

 

Changing the following gets me down to only two errors (though the grammar should still work I think):

 

DollarLineExpression.Rule = DollarExpressions;

TO:

DollarLineExpression.Rule = DollarExpression;

 

Since it's already implicitly included in the top declaration:

 

LineExpressions.Rule = MakeStarRule(LineExpressions, LineExpression);

 

 

It feels like I'm having the wrong "approach" - as if my logic is more kind of "recursive" and I somehow need to "flatten" it for the LALR parsing?

 

The following errors remain:

Shift-reduce conflict. State S9, lookaheads [$]. Selected shift as preferred action.
Shift-reduce conflict. State S15, lookaheads [$]. Selected shift as preferred action.

 

Thank you,

Jan 12, 2013 at 3:47 PM

Yay, I solved it. 

 

It seems like the problem was that some rules couldn't be deduced due to same starting terminals... the following works:

 

using System;
using Irony.Interpreter;
using Irony.Parsing;


namespace SlickScript.Language
{

    [Language("my grammar", "0.1", "my grammar")]
    public class MyGrammar : InterpretedLanguageGrammar
    {
        public MyGrammar()
            : base(true)
        {
            GrammarComments = "my grammar.";

            #region terminals

            //Number
            var NumberLiteral = new NumberLiteral("Number");
            NumberLiteral.DefaultIntTypes = new TypeCode[] { TypeCode.Int32 };
            NumberLiteral.Options |= NumberOptions.IntOnly;

            //String
            var StringLiteral = new StringLiteral("String", "\"", StringOptions.AllowsAllEscapes);
            
            //Keywords
            var TrueKeyword = ToTerm("true", "true");
            var FalseKeyword = ToTerm("false", "false");

            //Line comment
            var LineComment = new CommentTerminal("LineComment", "//", "\n", "\r");
            NonGrammarTerminals.Add(LineComment);

            //Block comment
            var BlockComment = new CommentTerminal("BlockComment", "/*", "*/");
            NonGrammarTerminals.Add(BlockComment);

            #endregion

            UsesNewLine = true;

            //Simple terminals definitions
            //var Identifier = new RegexTerminal("Identifier", @"[a-zA-Z_][a-zA-Z_0-9]*", typeof(Identifier), "");
            //Identifier.AstConfig.NodeType = typeof(Identifier);
            var Identifier = new RegexBasedTerminal("Identifier", @"[a-zA-Z_][a-zA-Z_0-9]*");
            var SimpleValue = new NonTerminal("SimpleValue");
            var StringValue = new NonTerminal("StringValue");
            var NumberValue = new NonTerminal("NumberValue");
            var BoolValue = new NonTerminal("BoolValue");
            var Text = new RegexBasedTerminal("Text", @"[^@$}\r\n\v]*");

            //Non terminals definitions
            var Program = new NonTerminal("Program");
            var LineExpressions = new NonTerminal("LineExpressions");
            var LineExpression = new NonTerminal("LineExpression");
            var AtOrDollarLineExpression = new NonTerminal("AtOrDollarLineExpression");
            
            var DollarLineExpression = new NonTerminal("DollarLineExpression");
            var AtLineExpression = new NonTerminal("AtLineExpression");
            var AtExpression = new NonTerminal("AtExpression");
            var AtTextExpression = new NonTerminal("AtTextExpression");
            var AtTextOrDollarExpression = new NonTerminal("AtTextOrDollarExpression");
            var AtExpressions = new NonTerminal("AtExpressions");
            var TextExpression = new NonTerminal("TextExpression");
            var DollarExpression = new NonTerminal("DollarExpression");
            var IdentifierExpression = new NonTerminal("IdentifierExpression");

            //Rules

            //Simple
            StringValue.Rule = StringLiteral;

            NumberValue.Rule = NumberLiteral;

            BoolValue.Rule = TrueKeyword | FalseKeyword;

            SimpleValue.Rule = StringValue | NumberValue | BoolValue;

            //Advanced
            Program.Rule = LineExpressions;

            LineExpressions.Rule = MakeStarRule(LineExpressions, LineExpression);

            LineExpression.Rule = AtOrDollarLineExpression + NewLine;

            AtOrDollarLineExpression.Rule = AtLineExpression | DollarLineExpression;

            AtLineExpression.Rule = "@" + AtExpressions;

            AtExpressions.Rule = MakeStarRule(AtExpressions, AtExpression);

            AtExpression.Rule = TextExpression | (TextExpression + DollarExpression) | (TextExpression + DollarExpression + AtTextExpression);

            AtTextExpression.Rule = "@" + TextExpression;

            TextExpression.Rule = Text;

            DollarLineExpression.Rule = DollarExpression;

            DollarExpression.Rule = "$" + IdentifierExpression;

            IdentifierExpression.Rule = Identifier;

            MarkTransient(AtOrDollarLineExpression);
            
            Root = Program;
            LanguageFlags = LanguageFlags.Default;
        }
    }
}

Thanks,

Coordinator
Jan 14, 2013 at 6:48 AM

Great! - happy you solved it. Just a few notes. In general, to save yourself from conflicts, you should:

1. Avoid 'replicating' rules like 'A.Rule = B'  - either A or B is not needed and should be removed from the grammar

2. Carefully choose rules for optional/(can be empty) non-terminals - proliferation of these causes ambiguities in grammar, and conflicts as a result. Star-lists (0 or more) combined with optional elements define the language that allows extra empty parse nodes in between real constructs, and these empty nodes are in fact sources of conflicts in parser automaton.

One more thing. Your language seems to be 'line-based', so line break is an active syntax element (like Basic or Python). There are some special facilities/options for these languages in Irony, have a look at GwBasic and MiniPython grammars, see if you can use them - and I think you should.

Good luck!

Roman 

Jan 15, 2013 at 10:34 PM

Thanks for the comments - I will have a look at the grammars you suggested and play around with them :)