Coping with literal brackets

Nov 13, 2012 at 10:58 AM
Edited Nov 13, 2012 at 11:07 AM


I am trying to develop a simple parser to  parse a something like

PARAM(arg1,arg2,(Bracketed Literal))


PARAM(arg1,arg2,Enter Value (Default=4))

so that the grammar is able to know that the Brackets in argument3 relates to the argument rather than precedence of the expression.

Here are some exerts from my grammar


Dim parameterArgLiteral = New RegexBasedTerminal("parameterArg", "[a-zA-Z0-9 _<>\-\(\)]*")

parameterOperator.Rule = ToTerm("PARAM")
parameterExpression.Rule = parameterOperator + LPAREN + parameterArgsExpression + RPAREN
parameterArgExpression.Rule = parameterArgLiteral Or Empty
parameterArgsExpression.Rule = parameterArgExpression + [COMMA] + parameterArgExpression Or 
                               parameterArgExpression + [COMMA] + parameterTypeExpression + [COMMA] + parameterKeyExpression


'Operator precedence            
RegisterOperators(10, "*", "/", "\", "%")
RegisterOperators(9, "+", "-")
RegisterOperators(8, "=", ">", "<", ">=", "<=", "<>", "!=", "!<", "!>")
RegisterOperators(7, "^", "&", "|")
RegisterOperators(6, "NOT", "IS")
RegisterOperators(5, "LIKE", "IN")
RegisterOperators(4, "AND")
RegisterOperators(3, "OR")

MarkPunctuation("(", ")", ".", ",")



I'm struggling to see how to set up the grammar to achieve that. Should I be looking at a custom terminal really?

Many thx


Nov 13, 2012 at 6:20 PM

ParameterArgExpression.Rule is assigned twice - that does not look reasonable, probably a typo

Not quite clear from your explanation what you're trying to achieve. Pls give input samples

Nov 14, 2012 at 7:11 AM

Hi and thx for replying

re ParameterArgExpression - not quite sure where you are seeing that. There is a singular ArgExpression and multiple ArgsExpression

I have an expression I am trying to parse say

MyVar = PARAM([name], [datatype], [prompttext])

where [name] is the name of the parameter, [datatype] is obviously its type and [prompttext] is the text which, say could be displayed to the user if the value is going to be captured via a form or something like that.

The prompt text, though, is the issue since it is pretty free text. It could be quoted, bracketed, or neither. In the case that it is neither bracketed nor quoted, it still could contain free text which itself holds brackets eg consider the text "Please enter date (default = 1/1/2012)". it could be used as follows

MyDoB = PARAM(dob,DATE,Please enter date (default = 1/1/2012))

I believe my current parser, ie using a RegExTerminal isn't good enough and  the date slashes are getting recognised rather than seeing the text after the second comma as a whole. 

I am currently creating a custom terminal to handle this and find the end of the argument, but am wondering if I am doing it wrong. For example, should I force it to be quoted or bracketed? Or how else could I force it to treat the data after the 2nd argument as a whole

Is that any clearer?

Thx again



Nov 15, 2012 at 9:56 PM
Edited Nov 15, 2012 at 9:57 PM

You need to put an "@" in front of RE patterns so that escape sequences are passed through to RegEx, like so:

@"some pattern text"

Nov 16, 2012 at 4:15 AM

Thx for replying. I'm using VB. Not sure that is still necessary, is it? VB compiler appears to object.

Nov 16, 2012 at 4:35 AM

So you are - I had not noticed those "dim" keywords. You ma need to double up your back-slashes then, if @ cannot precede te string, as I thought that VB and C# used a common set of DOT NET string special charcaters. I haven't programmed in VB for a while, so don't remember for sure.

Nov 16, 2012 at 2:04 PM

Sorry ... that should have said "VB compiler does NOT appear to object"! ... completely the opposite of what I wrote!

But still does handle the priority/precedence I am after



Nov 16, 2012 at 3:28 PM

Try replacing the back-slash hyphen near the end of the RE with just a hyphen immediately following the open-square-bracket. I am not sure how the escaped hyphen is going to be treated where it is, but the recommended way to add a hyphen to a set is to start the set definition with "[-" instead of just "[".

Nov 18, 2012 at 10:38 PM

I don't understand one basic thing, your initial problem

  1. is it that your Regex-based terminal does not work the way you want - so the problem is just a proper reg expression

  2. OR - beyond the regex, even when it "works", the parser does not parse and does something wrong. 

If it is the case 1, I can hardly help, not a wiz in reg expressions. One thing I noticed. According to the grammar paramArgLiteral can appear in the first parameter, second or the third (inside parenthesis). Then it means that expression terminators should include ")" (which it does) and comma (delimiter between expressions, when it appears as param #1); and comma is not mentioned in your regex. 

If it is #2, then provide more info (are there conflicts? what is the token list produced and what is the parser error?), but I bet it all comes from regex not properly working. 

I would definitely recommend going with double-quoted string instead of such free-form string which ends with h.. knows what and where - this may be confusing for user/script writer as well as much as it is confusing for Irony parser.


Nov 19, 2012 at 5:43 AM
Edited Nov 19, 2012 at 5:50 AM

Hi Roman

The problem is not the regex. Sorry to you and pg for not making that clearer. An example of the problem is that the 3rd argument of the function

PARAM(dob,DATE,Please enter date (default = 1/1/2012))

is not being recognized as a whole argument ie as a ParameterArgExpression but rather is being parsed into numbers, strings split by the brackets and date separators. The grammar is fine (ie loads into the Grammar explorer without error). When parsing the above, although the parser recognises arguments 1 and 2 as ParameterArgExpression, the final token before the parser reports a problem parsing the string is

Please (identifier)

so it is already trying to split up the 3rd argument.

If double quoting is the best way to do this, then thats great. I just wasn't sure if that would be the preferable way. I had wondered whether there was a way to construct the grammar in such a way so as to treat the text after the second comma (and before either the next comma - there are other optional parameters - or the matching bracket) as a whole. 

Thx (to both of you)