Xml Markup Extension Grammar

Jul 30, 2015 at 1:47 PM
I'm trying to implement a parser for Xaml Markup Extensions: https://msdn.microsoft.com/en-us/library/ee200269.aspx which has a relatively simple syntax:
 MarkupExtension = "{" TYPENAME 0*1Arguments "}"
 Arguments       = (NamedArgs / (PositionalArgs 0*1("," NamedArgs)) 
 NamedArgs       = NamedArg*("," NamedArg)
 NamedArg        = MEMBERNAME "=" STRING
 PositionalArgs = NamedArg / (STRING 0*1( "," PositionalArgs))
My Work in progress is at: https://gist.github.com/PolarbearDK/dbfc1fd8d0ffd7101651

It's able to parse simple examples like:
{Foo}
{Foo Bar}

But not:
{Foo Bar=42}

From the trace I can see Irony considering TYPENAME just about everywhere, but never considers NamedArg.

Can anybody spot what I'm doing wrong?
Jul 30, 2015 at 2:11 PM
Edited Jul 30, 2015 at 3:28 PM
I would first clean up your code/grammar before trying anything else
  1. Why are you defining custom terminals? I can't find a good reason for it.
  2. Try to use less implicit rules, they make things hard to process and debug
  3. It's better to use alternatives to the Q() method
Jul 30, 2015 at 2:23 PM
I HAVE cleaned it up as much as I can. Please enlighten me.
  1. The custom terminals are created to handle the odd syntax that governs this particular "language". See the Microsoft link for info. Do you think this can be achieved with less effort? How?
  2. Huh?
  3. OK, I found that in an old article about Irony but it also mentioned Star and Plus methods and they don't exist anymore... What are the alternatives. "| Empty"?
Jul 30, 2015 at 3:27 PM
Edited Jul 30, 2015 at 4:51 PM
1.

Typename doesn't seem to have any unusual rules, I think you can use IdentifierTerminal as long as you mark } as punctuaction.
Membername/string is indeed unusual.

The comment was more because you explicitly mention having trouble with tokens, thus I recommend first trying with standard tokens and only implementing the custom tokens once the production rules are correct.

2:

In your parse tree you get things like "Unnamed0" if you use "hidden rules". Basically if your rules contain brackets it could probably be done better. Also see 3.

3:
Empty is the direct translation of Q() and is more or less what Irony does internally, but you could also rewrite them slightly:
arguments.Rule = (namedArgs | (positionalArgs + ("," + namedArgs).Q())).Q();

arguments.Rule = namedArgs
               | namedArgs + positionalArgs
               | namedArgs + positionalArgs + "," + namedArgs
               ;

positionalArgs.Rule = namedArg | (@string + ("," + positionalArgs).Q()).Q();

positionalArgs.Rule = namedArg
                    | @string
                    | @string + "," + positionalArgs;
I find the second easier to read and debug, but this might just be a personal preference though.
Jul 31, 2015 at 11:09 AM
Thanks, I finally figured out what the problem was.

Consider a rule something like this:
argument = NamedArg | TEXT
NamedArg = MEMBERNAME "=" TEXT

What Irony seems to do is when deciding if something is a NamedArg or a TEXT, is to evaluate the FIRST terminal of each. Then it decides based on the number of characters consumed which should "win". If the two tokens has equal length, then the LAST wins.
It never includes "=" and TEXT of NamedArg in the decision.

In my case MEMBERNAME and STRING produces the exact same length (both terminates on "="), so STRING always wins :(

Could this be a bug in Irony?

My solution is to ignore STRING if it is terminated by "=".
I have updated the sample and it now works as expected: https://gist.github.com/PolarbearDK/dbfc1fd8d0ffd7101651