This project has moved and is read-only. For the latest updates, please go here.

Parsing out strings containing sub-expressions

Mar 9, 2010 at 10:32 PM

I am trying to write a parser which can take expressions roughly of the form

IdentifierTerminal identifier = new IdentifierTerminal(...);

FreeTextLiteral stringValue = new FreeTextLiteral(...);

KeyTrem opOpenSubExpr = new KeyTerm("$(");

... assume the undeclared terms are normal NonTerminals ...

expr.Rule = stringValue exprOpt | subExprOpt;

exprOpt := expr | Empty;

subExprOpt := subExpr + exprOpt;

subExpr := opOpenSubExpr identifier opCloseParen;

The trick is that I want to instruct the system that if subExpr is not properly composed (for instance, if opCloseParen is missing or if identifier has invalid characters, I want to instead re-interpret subExpr as a stringValue and consume everything up to some end point specified by that literal.  So for instance, the following inputs would generate a stream of terminals/non-terminals:

bar     --> stringValue

$(foo)  --> subExpr

bar$(foo    --> stringValue  (or stringValue stringValue would also be acceptable)

$(foo$(bar)  --> stringValue subExpr

foo$(bar)    --> stringValue subExpr

It seems like there should be some way to use an error rule to recognize when subExpr is failing to evaluate and then consume all of that text into a stringValue (or similar construct) but somehow I cannot get that to work.  Can you offer any suggestions?

P.S. This is a fantastic toolkit.  The GrammarExplorer especially is a huge help.  Keep up the excellent work!

Mar 12, 2010 at 9:22 PM

Not sure if it is easy or doable at all in the current version. Try to hack something (modifying Irony classes) and let me know if you can make it work. If the hack is small and generic enough (so it can benefit other cases) - I may put it in the code permanently


Mar 12, 2010 at 10:51 PM

Yeah, I am thinking that what I need is a custom literal that can detect the unclosed parens situation and return the whole bit as a literal.  As I am coding it up I will consider how it can be made generic.  Thanks.