Parser in infinite loop on incorrect input

Nov 8, 2014 at 2:57 PM
Below is my grammar. If I give it, what I would call an invalid input - "test1:test2:test3", then it gets into an infinite loop and eats memory like crazy and never returns from the parse method ... I'd probably want it to error out, or maybe return "test1:test2" as a ModifierName and "test3" as the ModifierValue ... Any Ideas on how I would accomplish this?
   class SearchGrammar: Grammar
    {
        public SearchGrammar(): base(false)
        {
            // Terminals
            var Term = new IdentifierTerminal("Term", @"!@#$%^*_'.?-\/", @"!@#$%^*_\/'.?0123456789");
            Term.CharCategories.AddRange(new UnicodeCategory[] {
                UnicodeCategory.UppercaseLetter, //Ul
                UnicodeCategory.LowercaseLetter, //Ll
                UnicodeCategory.TitlecaseLetter, //Lt
                UnicodeCategory.ModifierLetter,  //Lm
                UnicodeCategory.OtherLetter,     //Lo
                UnicodeCategory.LetterNumber,     //Nl
                UnicodeCategory.DecimalDigitNumber, //Nd
                UnicodeCategory.ConnectorPunctuation, //Pc
                UnicodeCategory.SpacingCombiningMark, //Mc
                UnicodeCategory.NonSpacingMark,       //Mn
                UnicodeCategory.Format                //Cf
            });
            //StartCharCategories are the same
            Term.StartCharCategories.AddRange(Term.CharCategories); 


            var Phrase = new StringLiteral("Phrase", "\"");
            var ImpliedAnd = new ImpliedSymbolTerminal("ImpliedAnd");

            // Non-Terminals
            var BinaryExpression = new NonTerminal("BinaryExpression");
            var BinaryOp = new NonTerminal("BinaryOp");
            var NotExpression = new NonTerminal("NotExpression");
            var NotOp = new NonTerminal("NotOp");
            var Expression = new NonTerminal("Expression");
            var PrimaryExpression = new NonTerminal("PrimaryExpression");
            var ParenthesizedExpression = new NonTerminal("ParenthesizedExpression");
            var ModifierExpression = new NonTerminal("ModifierExpression");
            var SearchExpression = new NonTerminal("SearchExpression");

            var ModifierName = new NonTerminal("ModifierName");
            var ModifierValue = new NonTerminal("ModifierValue");
           
            this.Root = Expression;

            Expression.Rule = Empty | PrimaryExpression | BinaryExpression | NotExpression;
            
            BinaryExpression.Rule = Expression + BinaryOp + Expression;
            
            BinaryOp.Rule = ImpliedAnd | "and" | "&" | "or" | "|";

            NotExpression.Rule = NotOp + Expression;

            NotOp.Rule = ToTerm("-") | "not";

            PrimaryExpression.Rule = SearchExpression
                                    | ParenthesizedExpression
                                    | ModifierExpression;

            SearchExpression.Rule = Term | Phrase;

            ParenthesizedExpression.Rule = "(" + Expression + ")";

            ModifierExpression.Rule = ModifierName + ":" + ModifierValue;
            ModifierName.Rule = Term | Phrase;
            ModifierValue.Rule = Term | Phrase;

            MarkTransient(PrimaryExpression, ModifierName, ModifierValue, Expression, BinaryOp, NotOp);
            MarkPunctuation("(", ")", ":");
            RegisterOperators(10, "or", "|");
            RegisterOperators(20, "and", "&");
            RegisterOperators(20, ImpliedAnd);
            RegisterOperators(30, "-", "not");
            RegisterBracePair("(", ")");

            AddToNoReportGroup(ImpliedAnd);
            AddToNoReportGroup("(", ")");
        }
    }
Coordinator
Nov 12, 2014 at 7:26 PM
your trouble is a combination of ImpliedSymbol (ImpliedAnd) and the fact that empty string can be interpreted as an expression.
So parser when seeing a simple term "Term" starts thinking that this is in fact:
Ident ImpliedAnd EmptyExpr

which results in reduce to BinaryExpression then Expression, and then cycle starts again:
Expr -> Expr ImpliedAnd EmptyExpr

Remove Empty from Expression.Rule, and then everything starts working. Anyway I think it is wrong arrangement to have


As for catching this infinite loop, it probably might be caught, but it's a bit tricky, it is not detectable in parser automaton (like conflicts), only at runtime while actually parsing, the parser should notice that it does not move forward while repeating the same state. I'll think about smth like this in the new version
Roman