This project has moved. For the latest updates, please go here.

How can I best grab a block of text that may contain the ending character?

Aug 8, 2014 at 8:35 PM
I'm parsing a configuration file that includes blocks like this:
rule MyConfigurationRule {
    when HTTP_REQUEST {
        if {[string tolower [HTTP::host]] contains "dev-" } {
            pool dev-pool
        elseif {[string tolower [HTTP::host]] contains "beta-" } {
            pool beta-pool
I don't actually care about parsing the details of the rule- I just need it back as a string so I can tell if it has changed. I would ideally like to have a rule defined like this:
    = ToTerm("rule") + Identifier + "{" + RuleBody + "}";
I've tried FreeTextLiteral and StringLiteral, but the embedded closing brackets keep causing issues. Any suggestions would be appreciated.
Aug 9, 2014 at 6:09 AM
I don't think it is a reasonable approach. You want to grab all until closing brace as one plain text, but the braces must be counted to account for blocks ( {..} ). So the parser/scanner should analyze the content and count braces. An what about occurrences inside string literals - which should not be counted I guess?! (what if "dev-" constant was "dev-{abc}{" ?
If you insist on going with a terminal, you have to create a custom terminal (similar to FreeTextLiteral or HereDoc) and add some tricky logic inside which detects which is the final closing brace
Aug 9, 2014 at 6:29 AM
Edited Aug 12, 2014 at 10:51 PM
I found a nice regex in the regex documentation on msdn that does capture balancing. With the RuleDefinition above, it works really well:
Terminal RuleBody = new RegexBasedTerminal("RuleBody", "[^{}]*(((?'Open'{)[^{}]*)+((?'Close-Open'})[^{}]*)+)*(?(Open)(?!))");
As you mentioned, the one caveat: curly brackets embedded in strings. These are going to require even more special handling.

Thanks for the response Roman- I appreciate you taking time to answer questions.
Marked as answer by devwulf on 8/8/2014 at 10:30 PM
Aug 15, 2014 at 6:15 PM
Here's the regex I've ended up with that handles curly brackets that are escaped like: { }
Terminal RuleBody = new RegexBasedTerminal("RuleBody", "([^\\\\{}]|\\\\{|\\\\})*(((?'Open'(?!\\\\{){)([^\\\\{}]|\\\\{|\\\\})*)+((?'Close-Open'(?!\\\\})})([^\\\\{}]|\\\\{|\\\\})*)+)*(?(Open)(?!))");