Another newbie question

Dec 12, 2012 at 1:04 PM
Edited Dec 12, 2012 at 2:39 PM

I'm new to the whole world of formal language definitions and grammars, and maybe I'm using the wrong tool for the job, but I'd like to use Irony because I've been playing with it and it's just cool...

I want to parse files that should have a certain structure. The file is separated into sections, with each section having a series of options. Options can have values which are double-quoted strings or plus-lists of integers/decimal numbers/"heads". A "head" has the form (positive int):(positive int) e.g. 1:6 or 19:3. Comments are started using a ; and last until the end of the line. The whole file might look something like this:

[section1]   ; section titles are surrounded by square brackets
option1    =    7
option2="some string value"

; blank lines are ignored


heads   =1:2,8:1,3:2
another_option = 1.0,2.04,0.01

Each option or section title must be on its own line. Sections cannot be nested.

My attempt at a grammar in Irony looked like this:

var stringValue = new IdentifierTerminal("String value");
var quotedString = new StringLiteral("Quoted string", "\"");
var _decimal = new NumberLiteral("Decimal");
var integer = new NumberLiteral("Integer", NumberOptions.IntOnly | NumberOptions.NoDotAfterInt);

var root = new NonTerminal("Root");
var section = new NonTerminal("Section");
var sectionTitle = new NonTerminal("Section title");
var options = new NonTerminal("Options");
var option = new NonTerminal("Option");
var optionName = new NonTerminal("Option name");
var optionValue = new NonTerminal("Option value");
var head = new NonTerminal("Head");
var multipleHeads = new NonTerminal("Multiple heads");
var multipleInts = new NonTerminal("Multiple ints");
var multipleDecimals = new NonTerminal("Multiple decimals");

var comma = ToTerm(",");
var col = ToTerm(":");
var eq = ToTerm("=");

var lb = ToTerm("[");
var rb = ToTerm("]");

RegisterBracePair("[", "]");

MarkPunctuation(lb, rb, eq, comma, col);

var comment = new CommentTerminal("Comment", ";", Environment.NewLine);

root.Rule = MakePlusRule(root, NewLinePlus, section); // NewLinePlus will skip over blank lines?

section.Rule = sectionTitle + NewLinePlus + options;
sectionTitle.Rule = lb + stringValue + rb;
options.Rule = MakePlusRule(options, NewLinePlus, option);
option.Rule = optionName + eq + optionValue;

optionName.Rule = stringValue;
optionValue.Rule = quotedString | multipleInts | multipleDecimals | multipleHeads;
head.Rule = integer + col + integer;
multipleInts.Rule = MakePlusRule(multipleInts, comma, integer);
multipleDecimals.Rule = MakePlusRule(multipleDecimals, comma, _decimal);
multipleHeads.Rule = MakePlusRule(multipleHeads, comma, head);

Root = root;

This gives the conflict: Shift-reduce conflict. State S12, lookaheads [LF]. Selected shift as preferred action.

State S12 (Inadequate)
  Shift-reduce conflicts on inputs: LF
  Shift items:
    Options -> Options ·LF+ Option
    LF+ -> ·LF
    LF+ -> ·LF+ LF
  Reduce items:
    Section -> Section title LF+ Options · [EOF LF]
  Transitions: LF+->S17, LF->S7

I'm not sure how to resolve this, or if my approach is completely wrong? I'm also wondering if my handling of the number literals is correct? - Will the parser ever mistake integers and decimals?

Thanks in advance

-Edit: I'm pretty certain my handling of the numbers is not correct, I think I was just being lazy and leaning on Irony's magic to sort it out for me - it doesn't give a conflict after all. Would I need to extend the grammar by a lot to do this properly? Cheers

Dec 14, 2012 at 6:36 AM

Make NewLinePlus part of rules (at the end), instead of specifying it as list delimiter:


      root.Rule = MakePlusRule(root, section); // NewLinePlus will skip over blank lines?
      section.Rule = sectionTitle + options;
      sectionTitle.Rule = lb + stringValue + rb + NewLinePlus;
      options.Rule = MakePlusRule(options, option);
      option.Rule = optionName + eq + optionValue + NewLinePlus;

Just checked, no conflicts, it parses 'successfully' your sample, except it errors on colon-delimited list in option value: "1:2,8:1,3:2"

but this is a different story. Set higher priority for 'integer':

integer.Priority = TerminalPriority.High;

That fixes it.

In general, I would recommend to avoid defining two conflicting numbers. Better define just decimal, and then after parsing verify that in places where only integer is allowed, there's in fact integer.


Dec 14, 2012 at 1:31 PM
Edited Dec 14, 2012 at 1:41 PM

Thanks Roman, that did work. I've modified the grammar to exclude the different types of number, and also to allow more general strings as the section title and option name non-terminals; they should be allowed to contain any characters except whitespace, square brackets, double quotes and equals, for example:

1_option = 1

1_option\\1 = 2:2,3:4

The modified grammar uses a RegexBasedTerminal instead of a StringLiteral:

var generalString = new RegexBasedTerminal("General string", "[^\\s\"=\\[\\]]+");


sectionTitle.Rule = lb + generalString + rb + NewLinePlus;


optionName.Rule = generalString;


Is this the correct approach?

Out of interest, I notice Irony tends to use public fields in favour of properties - is this a design choice for performance reasons or does it simply reflect the Alpha stage of the project?

Thanks a lot for your help

Dec 14, 2012 at 11:57 PM

Regex use - why not, just keep in mind it is slower than alternatives (like IdentifierTerminal, with extra allowed symbols)

Using fields - I'm one of the few (I think few) folks who don't see any reason to use properties when Field works fine. Only 'religious' BS - thu shalt not use fields :)

Fields are faster, source code is more compact, and binaries are smaller as a result. And easy to replace with property, if the need ever arises


Dec 20, 2012 at 12:52 PM
Edited Dec 20, 2012 at 12:53 PM

A little off topic but the point of using properties instead of fields is that it isn't easy to change later. Yes it's easy to change from a field to a property in your code, but if other's code depends on your code, you may start to have problems. Specifically someone might use reflection to grab the field, and then use it, and if you change it to a property, that breaks.

I too don't see a huge benefit of using properties everywhere, but in API's that are to be widely used, and someone might use reflection on it (which is likely considering the fact that this project attracts people interested in language design, and therefore are likely to be interested in reflection). It also doesn't make the code less compact, all you need to add (on the same line) is  { get; set; }.

As for performance: It seems as if the get and set methods are nearly always inlined anyways, so there should be no performance impact, and if even if they don't get inlined, then the tiny overhead they add wouldn't matter except in the most critical sections of code. 

I agree that most of the people use them religiously, but it's because they don't understand what's going on, and smarter people tell them they should use them (C# designed the {get;set;} shortcut specifically because people should use properties for API sections)

Dec 20, 2012 at 6:39 PM

I'm one of the religous followers of Propertyism - it allows you to control access to the setter and react to properties being set. Even if you don't want to react to a property change right now, you may want to in the future. Having conceeded that, making everything in the public interface a property then makes things uniform.

I do a lot of my work in WPF, so properties come in pretty handy there too. It's quite rare in my code to have a plain old field field style { get; set; } property.

Dec 20, 2012 at 7:58 PM

Religious issues aside, here's the statistics: since project was first published (4 years), there was a single (one, uno, 1 !!!) request to change a field to property, because the object was used in WPF binding in some user-built tool - which I happily satisfied. Just changed ';' => ' {get;set;}'. Nobody ever noticed - no other code changes, no test breaks, nothing. It's not worth even discussing I think. Just personal coding style, like placing braces or indent size - no impact on anything.

YAGNI - remember?