C99 grammar - conflicts with statement and declaration

Jul 31, 2014 at 6:32 PM
I'm having difficulty with conflicts on declaration versus statement. The following rule fails to detect a pointer declaration with initializer.
blockItemList.Rule = MakePlusRule(blockItemList, blockItem);
blockItem.Rule = declaration | statement;
The type of line it's failing on would be:
MyType *x = foo();
When I remove labeledStatement and expressionStatement from statement's rule (both may start with identifier), this type of declaration is recognized correctly. I fixed labeledStatement by starting it with a regex terminal rather than identifier...

What's the best way to force Irony to exhaust the declaration rule first before trying statement? Or, can I add to the grammar as Irony parses so that it can register MyType as a terminal rather than an identifier? I think I may fundamentally misunderstand how this is all supposed to work, because I thought it would try a different rule if one fails.
Coordinator
Aug 1, 2014 at 6:10 PM
there's no way currently to 'force Irony to exhaust' one statement and on failure try another. This is backtracking (exploring multiple paths). This is how basic LALR works. I suspect there are grammar conflicts there - refactor your grammar and fix the conflicts before you start parsing. The conflict error message points you to problem in your grammar, when parser has multiple paths to go; it chooses an arbitrary one and apparently that's not what you wanted. So - refactor the grammar and fix the conflicts. How to refactor - read about LALR shift/reduce conflicts - google it.
Aug 1, 2014 at 6:25 PM
Edited Aug 1, 2014 at 6:27 PM
Thanks for the reply. I'll be the first to admit I'm not the best at this stuff. At the same time, I'm not sure how to make C grammar non-ambiguous using Irony. It easily gets confused between "(type cast)(type cast)a" and "(type cast)(a * b)". This is because identifier is used for both type and variable name. Is there a recommended way to dynamically add and remove types and variables from the grammar, so that identifier doesn't need to be used? This would greatly disambiguate my grammar. I read somewhere that GCC used to use a symbol table hack for their LALR parsing of C.
Aug 1, 2014 at 9:36 PM
It seems a lot of my problems were due to the Q() method, and a lot are being solved by my symbol table hack... but if you have a better suggestion for that, I'm all for it.

New question: how to handle pre-processor includes? I want to come upon the symbol #include "header.h" and replace it with the contents of header.h without running a preprocessor, if possible. Can I dynamically change the parser source text?
Coordinator
Aug 6, 2014 at 7:49 PM
Hi again
about c language and problems with LALR parsing it. Unfortunately, C syntax does not fit well with LALR in general. There are even claims that C grammar (and especially C++) is not even context-free grammar, so it does not belong to the category of grammars fit for these kind of parsers. You mention the type cast expressions - this is the kind of things that make it non context free - parser has to take the semantic context into consideration when interpreting symbols.
As for preprocessor commands - there's no support for this in Irony, and providing support for this (particularly includes) would require significant changes to Scanner.

I'm working on a new version, and support for C-style pre-processor with macros and includes is considered too. It's all in design phase, still, not sure when I can produce anything downloadable - many other things keep me away from this. Sorry for bad news.
Roman
Sep 30, 2014 at 2:34 AM
Edited Sep 30, 2014 at 2:35 AM
Well, it's not ideal, but I've gotten things to a useable point by making a symbol table hack and then porting jcpp to C#. My actual grammar isn't ready for release, but here's a sample of hijacking the source for processing (in case anyone is this crazy). Irony's been a great learning tool, but long-term I should probably rewrite as recursive-descent...
public class Parser : Irony.Parsing.Parser
    {
        static MethodInfo _thisParseAll;
        static MethodInfo _thisReset;
        static Parser()
        {
            _thisParseAll = typeof(Irony.Parsing.Parser).GetMethod("ParseAll", BindingFlags.NonPublic | BindingFlags.Instance);
            _thisReset = typeof(Irony.Parsing.Parser).GetMethod("Reset", BindingFlags.NonPublic | BindingFlags.Instance);
        }

        public Parser(Grammar grammar) : base(grammar){
        }

        private void Reset()
        {
            _thisReset.Invoke(this, null);
        }

        private void ParseAll()
        {
            _thisParseAll.Invoke(this, null);
        }

        public new ParseTree Parse(string fileName)
        {
            Reset();
            Context.Source = new PreprocessedSourceStream(fileName);
            Context.SetCurrentParseTree(new ParseTree(null, fileName));
            Context.SetStatus(ParserStatus.Parsing);
            var sw = new Stopwatch();
            sw.Start();
            ParseAll();
            //Set Parse status
            var parseTree = Context.CurrentParseTree;
            bool hasErrors = parseTree.HasErrors();
            if(hasErrors)
                parseTree.SetStatus(ParseTreeStatus.Error);
            else if(Context.Status == ParserStatus.AcceptedPartial)
                parseTree.SetStatus(ParseTreeStatus.Partial);
            else
                parseTree.SetStatus(ParseTreeStatus.Parsed);
            //Build AST if no errors and AST flag is set
            bool createAst = this.Language.Grammar.LanguageFlags.IsSet(LanguageFlags.CreateAst);
            if(createAst && !hasErrors)
                Language.Grammar.BuildAst(Language, parseTree);
            //Done; record the time
            sw.Stop();
            parseTree.ParseTimeMilliseconds = sw.ElapsedMilliseconds;
            if(parseTree.ParserMessages.Count > 0)
                parseTree.ParserMessages.Sort(LogMessageList.ByLocation);
            return parseTree;
        }
    }

    public static class ParsingContextHacks
    {

        static PropertyInfo _propertyStatus;
        static PropertyInfo _propertyCurrentParseTree;
        static ParsingContextHacks()
        {

            _propertyStatus = typeof(Irony.Parsing.ParsingContext).GetProperty("Status");
            _propertyCurrentParseTree = typeof(Irony.Parsing.ParsingContext).GetProperty("CurrentParseTree");
        }

        public static void SetStatus(this ParsingContext context, ParserStatus status)
        {
            _propertyStatus.SetValue(context, status);
        }

        public static void SetCurrentParseTree(this ParsingContext context, ParseTree parseTree)
        {
            _propertyCurrentParseTree.SetValue(context, parseTree);
        }
    }

    public static class ParseTreeHacks
    {
        static PropertyInfo _propertyStatus;

        static ParseTreeHacks()
        {
            _propertyStatus = typeof(Irony.Parsing.ParseTree).GetProperty("Status");
        }

        public static void SetStatus(this ParseTree context, ParseTreeStatus status)
        {
            _propertyStatus.SetValue(context, status);
        }
    }
public class PreprocessedSourceStream : ISourceStream
    {
        CppNet.Preprocessor _preprocessor;
        StringBuilder _buffer;
        bool _eof;

        public PreprocessedSourceStream(string fileName)
        {
            var pp = new Preprocessor();
            pp.addFeature(Feature.DIGRAPHS);
            pp.addFeature(Feature.TRIGRAPHS);
            pp.addFeature(Feature.OBJCSYNTAX);
            pp.addWarning(Warning.IMPORT);
            pp.addFeature(Feature.INCLUDENEXT);
            pp.setListener(new PreprocessorListener());

            pp.getSystemIncludePath().Add(@"C:\XcodeDefault.xctoolchain\usr\include");
            pp.getSystemIncludePath().Add(@"C:\XcodeDefault.xctoolchain\usr\lib\clang\6.0\include");
            pp.getFrameworksPath().Add(@"C:\iPhoneOS8.0.sdk\System\Library\Frameworks");
            pp.getSystemIncludePath().Add(@"C:\iPhoneOS8.0.sdk\usr\include");

            pp.addMacro("__AARCH64_SIMD__");
            pp.addMacro("__ARM64_ARCH_8__");
            pp.addMacro("__ARM_NEON__");
            pp.addMacro("__LITTLE_ENDIAN__");
            pp.addMacro("__REGISTER_PREFIX__", "");
            pp.addMacro("__arm64", "1");
            pp.addMacro("__arm64__", "1");

            pp.addMacro("__APPLE_CC__", "6000");
            pp.addMacro("__APPLE__");
            //pp.addMacro("TARGET_CPU_ARM64", "1");
            pp.addMacro("__GNUC__", "4");
            pp.addMacro("OBJC_NEW_PROPERTIES");
            pp.addMacro("__STDC_HOSTED__", "1");
            pp.addMacro("__MACH__");
            Version version = new Version("8.0.0.0");
            pp.addMacro("__ENVIRONMENT_IPHONE_OS_VERSION_MIN_REQUIRED__", string.Format("{0:0}{1:00}{2:00}", version.Major, version.Minor, version.Revision));

            pp.addMacro("__STATIC__");


            pp.addInput(new CppNet.FileLexerSource(fileName));
            _preprocessor = pp;

            _buffer = new StringBuilder();
        }

        public Irony.Parsing.Token CreateToken(Terminal terminal)
        {
            var tokenText = GetPreviewText();
            return new Irony.Parsing.Token(terminal, this.Location, tokenText, tokenText);
        }
        public Irony.Parsing.Token CreateToken(Terminal terminal, object value)
        {
            var tokenText = GetPreviewText();
            return new Irony.Parsing.Token(terminal, this.Location, tokenText, value);
        }

        public bool EOF()
        {
            return _eof && _location.Position == _buffer.Length;
        }

        SourceLocation _location;
        public SourceLocation Location
        {
            get
            {
                return _location;
            }
            set
            {
                _location = value;
            }
        }

        public bool MatchSymbol(string symbol)
        {
            FillTo(_previewPosition + symbol.Length);
            if(_buffer.Length < _previewPosition + symbol.Length) {
                return false;
            }
            int cmp = string.Compare(_buffer.ToString(_previewPosition, symbol.Length), 0, symbol, 0, symbol.Length, StringComparison.CurrentCulture);
            return cmp == 0;
        }

        public char NextPreviewChar
        {
            get
            {
                FillTo(_previewPosition + 10);
                if(_buffer.Length <= _previewPosition + 1) { return '\0'; }

                return _buffer[_previewPosition + 1];
            }
        }

        public int Position
        {
            get
            {
                return _location.Position;
            }
            set
            {
                _location.Position = value;

            }
        }

        public char PreviewChar
        {
            get
            {
                FillTo(_previewPosition + 10);
                if(_buffer.Length <= _previewPosition) { return '\0'; }

                return _buffer[_previewPosition];
            }
        }

        int _previewPosition;
        public int PreviewPosition
        {
            get
            {
                return _previewPosition;
            }
            set
            {
                _previewPosition = value;
            }
        }

        public string Text
        {
            get {
                FillTo(Math.Max(_buffer.Length + 1000, _previewPosition));
                return _buffer.ToString();
            }
        }


        private void FillTo(int position)
        {
            if(_eof) {
                return;
            }
            while(_buffer.Length < position) {
                CppNet.Token token = _preprocessor.token();
                if(token.getType() == CppNet.Token.EOF) {
                    _eof = true;
                    return;
                }
                _buffer.Append(token.getText());
            }
        }

        private string GetPreviewText()
        {
            FillTo(_previewPosition);
            var until = _previewPosition;
            if(until > _buffer.Length) until = _buffer.Length;
            var p = _location.Position;
            string text = _buffer.ToString(p, until - p);
            return text;
        }
    }