Fixed formating grammar syntax -

Aug 5, 2012 at 3:48 PM
Edited Aug 5, 2012 at 11:10 PM

I am very new to Irony.  Just downloaded it last night and spent about 6 hours figuring out the basics of how it works.  It is very cool, as I have struggled the past few years to work with BNF in C# and have had to come up with lame XML grammar definitions that I really do not like.  It looks like Irony might solve a number of issues that I am having, and have had for many years.  I ran into Irony while trying to build the requirements to add my scanner and parser to a .Net language packaged, and I see that Irony already supports this.  Very nice. 

Can Irony handle fixed position grammars?  Maybe something like...

    somevar          CS  "this is a string constant"

How can I specifically code for the "C" in a given position, say character position 25 that defines the variable as a constant?  My sample is not spaced correctly, but I think you can get the idea.  The "C" is optional and might not be there at all.  The "S" would be position 26 and might tell the compiler this is a string.  This is a common syntax style for fixed-format languages from the 1970's and 80's.

Here's a VERY simple grammar that I came up with as a test to see if I can setup the BNF for fixed character positions.  I had to put a space between the "C" and "S" to get the sample grammar below to work.  But that will not work in the long run.  I am not sure exactly why, but I do get a syntax error using the sample line above.  I think I am misunderstanding some of the Irony terminal and non-terminal constructs and have something setup wrong here. 

But I have no idea how to specific fixed character positions for the "C" and the "S".  Is this possible with Irony?  If not, that just means that I have to do the first pass using my existing parser and a second pass can use Irony.  That is not too bad, but I think it would be nice to have a single pass using Irony if it is possible.   Maybe dump my scanner and parse and replace it with Irony.  That would be VERY nice!

What I cannot figure out is a way to describe the CONSTANT is position 25.  Other than that, both of the two version of this (one has been commented) work.  If I can just add the position requirement I think Irony can work as a single pass parser for my compiler.  Too bad I did run into this in 2004 when I wrote the compiler!   

 

            var variable = new IdentifierTerminal("Identifier");            
            var stringLiteral = new StringLiteral("STRING", "\"", StringOptions.None);

            var PROGRAM = new NonTerminal("PROGRAM");
            var LINE = new NonTerminal("LINE");
            //var DATATYPE = new NonTerminal("DATATYPE");
            //var CONSTANT = new NonTerminal("LINE");
            var DATATYPE = new FixedLengthLiteral("DATATYPE", 1, TypeCode.String);
            var CONSTANT = new FixedLengthLiteral("CONSTANT", 1, TypeCode.String);
            //var CONSTANT = new RegexLiteral(@"\C\c[25]"); 
this.Root = PROGRAM; PROGRAM.Rule = MakePlusRule(PROGRAM, LINE); //DATATYPE.Rule = "s"; //CONSTANT.Rule = "c"; LINE.Rule = NewLine | variable + CONSTANT + DATATYPE + stringLiteral;

 

Any thoughts or ideas would be much appreciated.

Best regards,

Jon

Coordinator
Aug 6, 2012 at 5:28 AM

ok, as far as I understood, a typical line starts with a label (somevar), then at fixed position there is 'line type' indicator like CS, then some expression. 

Here's how you do it. 

Write your grammar as if there no restrictions on positions. But do one thing: define Label terminal (as IdentifierTerminal) - use it as a special terminal that will match the beginning "somevar" labels. Do not use it anywhere else in the grammar, define a different terminal "Variable" for identifiers inside expressions. 

Define LineType and Line nonterminal :

LineType.Rule = ToTerm("S") | "C" | "D" .... ;

Line.Rule = label + lineType + someExpr;

Now, the trick to use is to intercept when label terminal is "scanned" and do some hand coding.

Hook to 'label.ValidateToken' event. Do the following in the handler:

 

        void label_ValidateToken(object sender, ValidateTokenEventArgs e) {
          //Assuming lineType is at position 25
          var label = e.Context.CurrentToken;
          if (label.Location.Column > 0) {
            e.SetError("invalid position of label, must start at first column.");
            return; 
          }
          // (also check label length, check that there are only spaces until pos 25, etc)
          // advance source to linetype position 
          var lineStart = label.Location.Position;
          var src = e.Context.Source;
          src.Position = lineStart + 25;
          // Read current char
          var lineTypeStr = src.PreviewChar.ToString();
          //Manually produce lineType token
          // _lineType is class-level field containing line type terminal used in grammar
          Token lineTypeToken = new Token(_lineType, src.Location, lineTypeStr, lineTypeStr);
          //Pack labelToken and lineTypeToken into Multitoken
          var multi = new MultiToken(label, lineTypeToken);
          //Replace current token (label) with multitoken
          e.ReplaceToken(multi);
        }

Also add the following constructor to MultiToken in Irony core (it's just easier clearer way to create it, I will push this change next time):

    public MultiToken(params Token[] tokens) : this(tokens[0].Terminal, tokens[0].Location, new TokenList()) {
        ChildTokens.AddRange(tokens);
    }

That should do it. Note I did not test the code, you may need to debug and tweak it. I hope the idea is clear enough

Roman