Recovering from unexpected EOF

Sep 27, 2010 at 9:01 PM


I am having trouble adding an error rule to recover from an unexpected EOF so that outlining continues to work when the file does not parse correctly. Please find below the top-level snippet of my grammar. For example, when having one syntactically correct CFB Math block followed by one that is not due to missing #END_CFB; I lose outlining. I have tried a myriad of error rules and nothing seems to work! Another pair of eyes and some insight would be greatly appreciated.

var CFB_BEGIN = Keyword("#CFB");            

var CFB_END = Keyword("#END_CFB");            
var INIT = Keyword("init");

NonTerminal BodyPortion = new NonTerminal("body portion");            
NonTerminal CFBMathBlock = new NonTerminal("CFB Math Block");            
NonTerminal InitPortion = new NonTerminal("init portion");            
NonTerminal TimerDeclaration = new NonTerminal("Timer Declaration");

            this.Root = ProgramContent;
            ProgramContent.Rule = (CFBMathBlock).Star();                        
            DeclarationSection.Rule = (DeclarationSection.Rule | TimerDeclaration).Star();
            TimerDeclaration.Rule = TIMER + (SLOW | FAST).Q() + ":" + NameList + EQ + INTEGERNUMBER + "," + INTEGERNUMBER + "," + BOOL.Q() + "," + BOOL.Q() + ";";

            ExecutableSection.Rule = InitPortion | BodyPortionNoInit | (InitPortion + BodyPortion);

            // BodyPortionNoInit defined in the CommonGrammar
            InitPortion.Rule = INIT + (Statement).Plus();
            BodyPortion.Rule = BODY + (Statement).Plus();

            CFBMathBlock.Rule =
CFB_BEGIN + Name +                        
CodeSpecifierSection.Q() +                        
DeclarationSection.Q() +                        
BEGIN +                        
ExecutableSection.Q() +                        
CFB_END + ";";

Many Thanks,


Sep 27, 2010 at 11:10 PM

First of all, please get rid of all Star(), Plus() and Q() functions - they are deprecated because they don't work properly in many cases; use MakeStarRule, MakePlusRule instead. Also, please define explicit non-terminals for sub-expressions (like SLOW|FAST)

I don't see any error rules setup

Sep 28, 2010 at 4:34 PM

Thanks a lot for the tip about the deprectated API. I have modified the grammar to use Empty, MakeStarRule, and MakePlusRule.

The reason I did not include any error rules is that the only place that would make sense for an unexpected EOF situation is the top-most rule, which has EOF by default. However, it did not work. I was also wondering why the expected set was null instead of one that includes "#end_cfb".

So after implementing your suggestions above I dug into the CoreParser class, specifically, the private method TryRecoverFromError(). this method basically prevents recovery when hitting an unexpected EOF. I wonder if this is correct behavior. For example, it causes the entire tree to be lost when all of your source is correct with the exception of a missing "close" (e.g. #end_cfb) at the end of the file. I am including the updated grammar below. May be it can help with pointing me in the right direction. I really appreciate all your help.

var CFB_BEGIN = Keyword("#CFB");            
var CFB_END = Keyword("#end_CFB");            
var INIT = Keyword("init");

NonTerminal BodyPortion = new NonTerminal("body portion");
NonTerminal CFBDeclaration = new NonTerminal("CFB Declaration");            
NonTerminal CFBDeclarations = new NonTerminal("CFB Declarations");            
NonTerminal CFBMathBlock = new NonTerminal("CFB Math Block");            
NonTerminal CFBMathBlocks = new NonTerminal("CFB Math Blocks");            
NonTerminal InitPortion = new NonTerminal("init portion");      
NonTerminal NonEmptyStatmentList = new NonTerminal("Statement+");      

            this.Root = ProgramContent;

            ProgramContent.Rule = CFBMathBlocks;            
    ProgramContent.ErrorRule = SyntaxError + Eof;

    CFBMathBlock.Rule = CFB_BEGIN + Name + CodeSpecifierSection + CFBDeclarations + BEGIN + ExecutableSection + CFB_END + ";";            
    CFBMathBlock.ErrorRule = SyntaxError + ";";

            CFBMathBlocks.Rule = MakeStarRule(CFBMathBlocks, CFBMathBlock);

            // DeclarationSection defined in the CommonGrammar
            CFBDeclaration.Rule = DeclarationSection;
    CFBDeclarations.Rule = MakeStarRule(CFBDeclarations, CFBDeclaration); 

            NonEmptyStatmentList.Rule = MakePlusRule(NonEmptyStatmentList, Statement);

            ExecutableSection.Rule = Empty | InitPortion | BodyPortionNoInit | (InitPortion + BodyPortion);

            // BodyPortionNoInit defined in the CommonGrammar

            InitPortion.Rule = INIT + NonEmptyStatmentList;
            BodyPortion.Rule = BODY + NonEmptyStatmentList;


Oct 13, 2010 at 5:57 AM

Well, very interesting question, and i don't have an immediate answer. This is a valid, real scenario (explicit program end token) - but Irony's error recovery does not support this out of the box. You may try to write token filter that injects this "end_CFB" token at the end of file - when 1) we are in error recovery 2) end_cfb is not there

Then your error recovery rules would work - I guess


PS sorry for the delay with response

Jun 18, 2012 at 9:57 AM


Did you write the token filter ?

Or did you find a work around ?


Nov 7, 2012 at 3:39 PM


Sorry to open up this rather old thread, but I also found the above scenario quite disturbing. Parsing (with syntaxerror) works quite fine - with the exception of the "out most construct", e.g. a missing "end;" or whatever prohibits the "Recovered" state.

So I was wandering... why not just change the one return statement in ErrorRecoveryParserAction.cs::TryRecoverFromError() from "return false" to "return true" in case Eof is encountered? This would allow us to use the ParseTree we have so far - in case of Syntax errors no one would expect the tree to be perfect, but in many cases I prefer a broken tree over no tree at all.

I guess I'm missing something important here, otherwise you'd surely have the changed the code accordingly yourself by now :-)

Anyway, regards,


Nov 9, 2012 at 5:21 PM

there is a bigger and more general problem with error recovery. Currently implemented scheme is in line with file-based compilation. The purpose of recovery is to move to some next point where compiler can continue and find more errors. And nobody cares about parse tree once the error is encountered. That's the way it was originally built, following standard recipes from compiler books. The result is what you see - on EOF, there's no reason the recover, nothing else left to compile.

However, syntax highlighting and intellisense in text editor is a completely different story. The recovery should try to produce some kind of parse tree no matter what, to allow for intellisense to work as user enters the code in the editor. The general mechanism should be to inject "missing" tokens to complete the structures (like statements, methods, classes), so some kind of tree might be produced and analyzed, so name lists can be provided to intellisense dropdowns as an example. This recovery with missing token injection is something in the future. Until then - just invent some workarounds


Nov 9, 2012 at 6:54 PM

Hmm, looking at it that way seems reasonable :-)

Nevertheless, and for completeness, I want to mention my (preliminary) solution. At this state, I cannot say if it will work under all circumstances (and especially as the USER would expect it to work), but putting away the future problems, that's what I did (using the modification I suggested above, because I want a tree even if EOF is reached). The advantage, as far as I am concerned, is that I don't need to inject missing tokens (which I guess would be far more complicated):

(a) When someone writes code, the "nice" thing is that most of the time (between user modifications) the file compiles. This is the point in time when a correct parse-tree can be generated. While the user modifies the file, the code usually won't compile, thus the generated parsetree has errors, which I can check by calling _parseTree.HasErrors(). In that case, I stick to the correct (old) tree, so intellisense still works with all the symbols from the correct file. As soon as the code compiles again, the parsetree is updated with a new, error-free version. This scenario works fine if the user manually types code. Making larger changes using copy-paste is a problem, because the pasted code (in case it contains, e.g. new variable definitions) will not be part of intellisense until the parsetree is correct again.

(a - remark) Always using the latest parsetree, even if incorrect, is a bad idea, because syntax errors can render many symbol definitions invalid, thus intellisense would definitely not work as the user would expect.

(b) Starting with an erroneous file that doesn't compile, works quite the same way, but I always have to use an erroneous tree, as there's nothing else available. In that case, I can always update the tree once I have a new one. The question whether the old one was better than the new one is difficult if not impossible to answer.

(c) To make things more sophisticated, it is also possible to use symbols from both the old and the new parse-tree. That would solve some problems, but make things not only more complicated, but would also need doulbe the performance for scanning the parsetrees for symbols, and then look up the symbols in two different symbol sets.


Nov 9, 2012 at 7:48 PM

there are several problems with this... first, "... until code compiles again..." - this might never happen, or not happen for a long time. Imagine, user edits some class file, messes up around method #1, while there's some syntax error at the end of the file. User might complete the method #1 (making it correct) and start typing method #2, but method1's name will never appear in intellisense - because the error at the end prevents entire file from being ever correct. 

Another, very typical scenario. User starts typing new class from scratch: opens namespace, declares class, method, starts typing body... while nothing appears after the typing point, so all closing braces are missing. No way parser can build a parse tree (without injecting missing closing braces). But Intellisense should work in this case - at least in VS it works, just checked.  

So i think it needs more tweaking. Erroneous parse tree must be able to "expose" names that are estimated to be correct (like finished and correct property, field definitions) even when they are mixed up with complete garbage.