precedence / priority of keywords

Sep 3, 2012 at 9:03 PM

Hello

My language needs to be able to understand the letters a-g as terminals. These letters should be able to be written together in any order with or without spaces.

I also have keywords, some of which use these letters. It seems that if i define the letters a-g as terminals, the parser always gives this precendence, so that any keyword containing a letter a-g creates a parse error. What i need to be able to do is tell the parser to prioritise the keywords.

Please see below a simplified example:

example of what i am trying to parse:

abc for( def )for

example code:

var note = new RegexBasedTerminal("note","[a-g]");

var statement = new NonTerminal("Statement");           
statement.Rule = loop | note;
var statementList = new NonTerminal("statementList");
statementList.Rule = MakePlusRule(statementList, statement);

var loop = new NonTerminal("Loop");
loop.Rule = ToTerm("for(") + statementList + ToTerm(")for");

this.Root = statementList 

The above example will create a parse error like "Invalid character: 'o'. " because it interpreted my keyword 'for(' as 'f' followed by 'or(' instead of as the keyword i intended.

I have tried setting priority / precedence on the keywords and terminals but this seems to have no effect.

I hope I have expressed this clearly enough.

Any help on this would be greatly appreciated !

thanks,

Julian

Coordinator
Sep 5, 2012 at 6:58 AM

At first look, the problems I see:

do not use RegexTerminal for such a simple case, better directly express note as:

note.Rule = ToTerm("a") | "b" | "c" | .... | "g";

Do not include parenthesis into terminals, use it as a separate symbol, define loop statement as 

loop.Rule = ToTerm("for") + "(" + statementList + ")" + for;

One really strange thing, that may be the cause of all troubles - you don't have any explicit delimiter between statements? or statement-end symbol?

It does not look like you are implementing some existing language, are you making up your own? then I would suggest go back and rethink syntax, make structures more explicit - easier to express in BNF and as a side effect, easier to document

Roman

Sep 5, 2012 at 12:01 PM

Hi Roman

Thanks very much for your reply.

The language I am trying to recreate is a music programming language from 1987 called Ample (http://www.colinfraser.com/m5000/ample-nucleus-pg.pdf)

You are quite right - there are no explicit delimiters between statements. Whitespace is allowed purely to make the code more humanly readable.

I think it works by prioritising what it interprets, the priority being given to 'words' (which can be 'system' words (like the for loop statement) or 'user-defined' words) over musical notes. Notes are expressed with letters a-g, or A-G, (upper/lower case indicating a raising/lowering of pitch) with an optional + or - prefix (indicating flatted or sharpened notes). The example i gave for note was a simplification - the full regex (which works nicely) is [+|-]?[a-gA-G].

The following represents the tune 'BA-BA Black Sheep' in Ample (/ represents an empty beat):

c/c/G/G/ABCag///f/f/e/e/d/d/c///

This tune could be defined as a user word using the command:

"BABA" NAME MAKE

Once that word has been created, any further references to BABA will play the tune instead of the notes B A B A

BABABABA

would play the tune twice over.

BABABA
would play the tune once, followed by the notes B A.

Ample does have a lot of potential for ambiguity by nature. While this is a weakness on one hand, its flexibility makes for great creative possibilities.

Is it likely that Irony will not be able to cope with this type of prioritised approach to parsing, where there are no explicit delimiters between each component ?

Do you have any suggestions for how i could recreate this, or any workarounds (apart from requiring explicit delimiting) ?

Thanks again

Julian

Coordinator
Sep 6, 2012 at 5:59 PM

That's a tough case. The problem I see is that you introduce identifiers on the fly, like "BABA" - you introduce it using string literal (in quotes), but then expect it appear without quotes anywhere as identifier. I think it is doable, but you have to do some extra work.

I think you should create custom NoteTerminal for your notes and tune names; inherit it from Terminal. It will recognize your "notes" or extra names defined like "BABA". This NoteTerminal should hold the list of allowed notes/tunes (initially just "A".."G" notes). Then extra custom tunes can be added when you parse tune declaration nonterminal:

TuneDecl.Rule = StrLiteralName + "Name" + "Make";

Hook to TuneDecl.Reduced event - in the event handler, find the parsed tune name (it is first child of parsed node), and add it to the list of names in NoteTerminal. NoteTerminal should scan input using "longest-first" order of note/tune names, so it should sort names each time a name is added. 

Try this, I think it will work

Roman

Sep 8, 2012 at 8:50 AM

Maybe do some preprocessing? Since each statement is a unique one, you could create "one big switch" state-machine and automatically put spaces between statements.

Sep 21, 2012 at 6:38 PM
Edited Sep 21, 2012 at 6:40 PM

Ha! It sounds like you are working on something very similar to the Q-Basic ABC variant that I have just written a parser and synthesizer for.

link: http://en.wikibooks.org/wiki/QBasic/Appendix#PLAY

This (working) grammar may give you some ideas:

         #region 1-Terminals
         Terminal modePlay         = new RegexBasedTerminal("modePlay",   @"M[BF]");      // NO-OP's
         Terminal mode            = new RegexBasedTerminal("mode",         @"M");
         Terminal modeStyle      = new RegexBasedTerminal("modeStyle",   @"[NLS]");

         Terminal tempo            = new RegexBasedTerminal("tempo",      @"T");
         Terminal length         = new RegexBasedTerminal("length",      @"L");
         Terminal octave         = new RegexBasedTerminal("octave",      @"O");
         Terminal integer         = new RegexBasedTerminal("integer",      @"[0-9]+");
         Terminal shift            = new RegexBasedTerminal("shift",      @"O*[<>]");

         Terminal note            = new RegexBasedTerminal("note",         @"N");
         Terminal rest            = new RegexBasedTerminal("rest",         @"P");
         Terminal noteLetter      = new RegexBasedTerminal("noteLetter", @"[ABCDEFG]");
         Terminal sharpFlat      = new RegexBasedTerminal("sharpFlat",   @"[-#+]");
         Terminal dots            = new RegexBasedTerminal("dots",         @"\.+");

         MarkPunctuation(modePlay, NewLine);
         #endregion 1-Terminals

         #region 2-Nonterminals
         NonTerminal MusicList   = new NonTerminal("MusicList",   typeof(MusicListNode));
         NonTerminal Music         = new NonTerminal("Music");
         NonTerminal Directions   = new NonTerminal("Directions",   typeof(DirectionsNode));
         NonTerminal Direction   = new NonTerminal("Direction");
         NonTerminal Tempo         = new NonTerminal("Tempo",         typeof(TempoNode));
         NonTerminal Length      = new NonTerminal("Length",      typeof(LengthNode));
         NonTerminal ModeStyle   = new NonTerminal("ModeStyle",   typeof(StyleNode));
         NonTerminal Octave      = new NonTerminal("Octave");
         NonTerminal OctaveNo      = new NonTerminal("OctaveNo",      typeof(OctaveNoNode));
         NonTerminal OctaveShift   = new NonTerminal("OctaveShift",   typeof(OctaveShiftNode));

         NonTerminal Note         = new NonTerminal("Note");
         NonTerminal Rest         = new NonTerminal("Rest",         typeof(RestNode));
         NonTerminal NoteMod      = new NonTerminal("NoteMod",      typeof(NoteModNode));
         NonTerminal LetterNote   = new NonTerminal("Note",         typeof(LetterNoteNode));
         NonTerminal NumberNote   = new NonTerminal("NumberNote",   typeof(NumberNoteNode));

         NonTerminal SharpFlat   = new NonTerminal("SharpFlat");
         NonTerminal NoteValue   = new NonTerminal("NoteValue");
         NonTerminal Dots         = new NonTerminal("DotExpr");

         MarkTransient(Music, Direction, Octave, Note, SharpFlat, NoteValue, Dots);
         #endregion 2-Nonterminals

         #region 3-Rules
         Root                  = MusicList;

         MusicList.Rule         = MakeStarRule(MusicList, Music);
         Music.Rule            = Directions | Note;
         Directions.Rule      = MakePlusRule(Directions, Direction);
         Direction.Rule         = ModeStyle | Tempo | Length | Octave
                              | PreferShiftHere() + modePlay
                              | PreferShiftHere() + NewLine;
         Tempo.Rule            = PreferShiftHere() + tempo + integer;
         Length.Rule            = PreferShiftHere() + length + integer;
         Octave.Rule            = OctaveNo | OctaveShift;
         OctaveNo.Rule         = PreferShiftHere() + octave + integer;
         OctaveShift.Rule      = PreferShiftHere() + octave + shift;
         ModeStyle.Rule         = PreferShiftHere() + mode + modeStyle;

         Note.Rule            = NumberNote | Rest | LetterNote;
         NumberNote.Rule      = note + integer;
         Rest.Rule            = rest + NoteValue + Dots;
         LetterNote.Rule      = NoteMod + NoteValue + Dots;
         NoteMod.Rule         = noteLetter + SharpFlat;

         SharpFlat.Rule         = Empty | sharpFlat;
         NoteValue.Rule         = Empty | integer;
         Dots.Rule            = Empty | dots;

         Direction.ErrorRule   = SyntaxError + NewLine;
         #endregion 3-Rules