Easy question

Apr 30, 2009 at 8:58 PM
Here my grammar:
            var distance = new NumberLiteral("distance", NumberFlags.AllowSign);
            var name = new IdentifierTerminal("name");

            var Tree = new NonTerminal("Tree");
            var SubTree = new NonTerminal("SubTree");
            var Leaf = new NonTerminal("Leaf");
            var Internal = new NonTerminal("Internal");
            var BranchList = new NonTerminal("BranchList");
            var Branch = new NonTerminal("Branch");
            var Length = new NonTerminal("Length");
            var Name = new NonTerminal("Name");

            Tree.Rule = SubTree + ";";
            SubTree.Rule = Leaf | Internal;
            Leaf.Rule = Empty | name;
            Internal.Rule = "(" + BranchList + ")" + Name;
            BranchList.Rule = Branch | Branch + "," + BranchList;
            Branch.Rule = SubTree + Length;
            Length.Rule = Empty | ":" + distance;
            Name.Rule = Empty | name;

            this.Root = Tree;       // Set grammar root
            RegisterPunctuation("(", ")");

The name terminal just allows starting with characters followed by characters, numbers and "_". I would like to accept also starting with numbers and the usage of blanks, "-" or other symbols in the name terminal. Currently (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); would be parsed but not (1A:0.1,B-Y:0.2,(C:0.3,D:0.4):0.5);

How can I define this?
Coordinator
May 1, 2009 at 3:53 AM
Well, the question is easy but the answer is not. There is a simple answer, but it wouldn't quite work - yet. 
So the simple answer - modify IdentifierTerminal to accept extra first chars and extra internal chars, there are properties for this, and even constructor overload with extra parameters for values of these properties. Then the Id terminal would perfectly accept digits at first position. However, that would break scanning. The problem is that now two terminals (distance and name) would allow digits as first character. The scanner (currently) selects the terminal(s) to scan the input based on current char in the input - it selects terminals that declare this char as one of its "first" chars. So in your case, both distance and name terminals would fit; then scanner would apply Priority value set on terminals and pick one with higher priority. The problem is that in your grammar it can be either of two, depending on situation, and it cannot be set through static priority value. So if priorities are the same, the scanner would try the terminals one-by-one, and return the first that produces the token - in fact, random in your case.
This is bad news. Good news, in your grammar an extra feature would help, but it's not there yet, not even in new "preview" drop. I'm talking about Scanner-Parser link/advice facility. When in doubt, the scanner may ask parser - which terminals you expect in current state? and parser may filter out the list of terminals for matching. The base code is there already, each ParserState has property "ExpectedTerms" that is created exactly for this.
In case of your grammar, it looks like numbers (distance) can appear only in certain positions, after colon, so parser would be able to solve the problem based on current state. But as I said, it is not there yet, not completed.
So which version of Irony are you using currently? - old release version or latest code drop? If the latest, then I can try to speed up and put this feature on top of the list and implement it in a day or two... sorry, currently terribly busy, day and night...
will that work?

Coordinator
May 1, 2009 at 7:03 AM
It's there, your grammar is in samples: ScannerTestGrammar; allows identifiers like "1A"; you can add "-" yourself; with whitespaces - that's really not so simple
Roman
May 1, 2009 at 4:21 PM
Hi Roman,

thank you so much for your answer! I use the release version but recompiled against the Silverlight runtime. I will try your new version once I get the whole pipeline in my project working. As an workaround I modified name to be a StringLiteral. I recognized that the software that outputs my target format keeps a quotation for name if I use it in the input string. Of course the string quotation is not specified in the original grammar and since this is standard known as Newick format  -  I would prefer a clean standard conform solution as you suggested. 

Jan
Coordinator
May 1, 2009 at 4:34 PM
You're welcome and good luck, let me know if you have any problems. I've added your grammar to samples as Newick grammar. It is also a test case for this Scanner-Parser link functionality. 
Roman