This project has moved and is read-only. For the latest updates, please go here.

Validating regular expressions

Dec 9, 2008 at 5:09 PM
I am working on a grammar that has a regular expression between two double quotes, e.g. "[0-9]+.[0-9]+". I'd like to use the .NET regex code to validate the expression instead of recreating the rex ex grammar in my grammar. Is this something that a filter between the scanner and the parser could handle? it could also emit a single token with the reg ex. I'm thinking the code to handle a single token that could be handed off to the .NET regex would be easier than writing code to interpret the reg expression. If a filter will work, can you point me to the class I should use for a custom filter?

Dec 9, 2008 at 6:25 PM
First try using RegExBasedTerminal - it is created just for the case like yours.
Another way to go is to create a custom terminal class inherited from StringLiteral. Override it's ConvertValue method, call base.ConvertValue and if it returns true then validate details.Body. If the value is not valid, set details.Error = "<error message>" and return true (still!).
Dec 9, 2008 at 7:05 PM
Thanks for the quick reply. RegExBasedTerminal worked just fine. I'm also using it for column/variable names now. It cut about 100 states out of the tree.

I created another constructor that included the name. It makes the code cleaner looking and mimics the Terminal class constructor. How do you handle submissions of something like that?
Dec 9, 2008 at 7:17 PM
Edited Dec 9, 2008 at 7:17 PM
In fact I already did. Your question prompted me to look at the code and I realized that the thing wasn't right.
By the way, I don't think you can create another constructor with the name - having all string parameters and params string arguments makes it impossible to distringuish two constructors, so you have to extend existing constructor. The new version will be in the next code drop
Dec 9, 2008 at 7:23 PM
Here's what I did:

public RegexBasedTerminal(string pattern, string name, params string[] prefixes)
        : base("RegEx:{" + pattern + "}")
        Pattern = pattern;
        Name = name;

with the following code to instantiate an object:

var Text_colPattern = new RegexBasedTerminal("\".*\"", "Text_colPattern");

The compiler sees that the last string isn't a string array, so it uses the new constructor. At least it works that way in my code.

Dec 9, 2008 at 7:29 PM
well, I don't quite undertand... did you add a new constructor - or changed existing one? shouldn't you pass name to the constructor.
Anyway, it is fixed now, the only difference I have is the name parameter is the first:


public RegexBasedTerminal(string name, string pattern, params string[] prefixes) : base (name) { ....


Dec 9, 2008 at 8:15 PM
oopss.. surprisingly - for me at list - the c# compiler does allow two constructors, with and without extra string parameter name (?!). I thought it would complain that two overloads are undistinguishable. Whatever, let's stick to a single version with name parameter coming first.
Dec 9, 2008 at 9:08 PM
I've already changed my copy of the class to have the name param. at the beginning. I figured that's where you were heading.

One other question has arisen as I have been playing with RegExBasedTerminal. It just uses the reg ex to match the terminal, and what I want to do is validate the reg ex inside the quotes. I think that leads me back to a filter, but I can't find an example of how to write one and then to hook it into the scanner-parser stream. Have you got any good examples you can point me to?
Dec 9, 2008 at 10:45 PM
Edited Dec 9, 2008 at 10:46 PM
I see, you want a quoted string to be recognized as your special token, but add validation if contents are not right. 
One possibility is to hook to Scanner.TokenCreated and check the token - if it is not valid, change its properties to make it error token. (don't replace token itself in args).
The other way is to subclass StringLiteral, as I explained before.
Token filter - maybe, but in general Token filters are fit for other kind of job -intercepting the stream and injecting/removing tokens in it. You can try it - just follow the pattern in two existing filters.
Dec 9, 2008 at 10:49 PM
Correction: with the first method, hooking to Scanner.TokenCreated - don't do anything with token if its not valid, just add error message to CompilerContext.Errors collection.