What would be the grammar for this?

Oct 9, 2011 at 5:19 PM
Edited Oct 9, 2011 at 5:24 PM

For a DSL that I am working on, I need to get a block of data between VAR and END_VAR. Whatever is there in between these tokens, I have to capture - but there is not particular rule to parse the content. It may be free text flow.

VAR
this is a multi-line free flow text; ....... @$#!#QWEQ ajkadkfjlj-((((((((
,,,,,,,,,,,~~ end of rubbish
END_VAR

A C# regular expression might be: @"\bVAR\b.*?\bEND_VAR\b". But I cannot find a way to capture the content between the keywords VAR and END_VAR using Irony.

Can anyone suggest please?

Coordinator
Oct 9, 2011 at 5:23 PM

The terminal for catching this free-form text until some end keyword is FreeTextLiteral.

Oct 9, 2011 at 5:25 PM

Oh! Can I get a small example here please?

Coordinator
Oct 9, 2011 at 5:47 PM

Look at FreeTextLiteralTests.cs in test project. You can have escaped chars inside the text, and the test code shows how to do this. 

basically all you need to do is create freeTextLiteral that will grab everything until END_VAR

var varContent = new FreeTextLiteral("varContent", "END_VAR");

 

varDecl.Rule = "VAR" + NewLine + varContent; 

Oct 9, 2011 at 6:56 PM

Thanks Roman for providing example here! I also had a look at the file.

I'll work it out from here.

Regards

Oct 9, 2011 at 7:18 PM
Edited Oct 9, 2011 at 7:19 PM

OK... so there is a problem - the rule makes its go into a forever loop.

var varContent = new FreeTextLiteral("varContent", "END_VAR");
var var_block = new NonTerminal("var_block", "VAR" + NewLine + varContent);

Here, newPos always stays same: 

        public override Token TryMatch(ParsingContext context, ISourceStream source)
        {
            string tokenText = string.Empty;
            while (true)
            {
                //Find next position
                var newPos = source.Text.IndexOfAny(_stopChars, source.PreviewPosition);
                if (newPos == -1)
                {
                    if (IsSet(FreeTextOptions.AllowEof))
                    {
                        source.PreviewPosition = source.Text.Length;
                        return source.CreateToken(this.OutputTerminal);
                    }
                    else
                        return null;
                }
                tokenText += source.Text.Substring(source.PreviewPosition, newPos - source.PreviewPosition);
                source.PreviewPosition = newPos;
                //if it is escape, add escaped text and continue search
                if (CheckEscape(source, ref tokenText))
                    continue;
                //check terminators
                if (CheckTerminators(source, ref tokenText))
                    break; //from while (true)        
            }
            return source.CreateToken(this.OutputTerminal, tokenText);
        }

Input is:

VAR
MESSAGE:STRING80;
(*_ORError Message*)
END_VAR

 

Problem seems to be the stop character 'E' which is part of "END_VAR" and also "MESSAGE".

Till I figure it out, I am sharing this problem here.

Coordinator
Oct 9, 2011 at 10:47 PM

try adding an option ConsumeTerminator:

 

var varContent = new FreeTextLiteral("varContent", FreeTextOptions.ConsumeTerminator, "VAR_END");

Oct 10, 2011 at 3:19 AM

No difference. Still stuck in forever loop for the same reason. :(

Oct 10, 2011 at 4:46 AM
Edited Oct 10, 2011 at 4:48 AM

This has fixed it.. but I don't know if it works for all cases.

In function FreeTextLiteral.TryMatch, change this line to:

//Find next position
var newPos = source.Text.IndexOfAny(_stopChars, source.PreviewPosition + 1);
And the usage is:

 

var varContent = new FreeTextLiteral("varContent", FreeTextOptions.ConsumeTerminator, "END_VAR");
var var_block = new NonTerminal("var_block", "VAR" + NewLine + varContent + NewLinePlus);

Coordinator
Oct 10, 2011 at 5:16 AM

yes, looks like you hit on a bug. My apologies. Not sure what's the proper way to fix it yet, the way you suggest it it won't correctly grab "empty" free text literal. Let me think a bit. 

Oct 10, 2011 at 3:25 PM

Yes, you are right. This test fails after my changes:

var term = new FreeTextLiteral("FreeText", ",", ")");
term.Escapes.Add(@"\\", @"\");
term.Escapes.Add(@"\,", @",");
term.Escapes.Add(@"\)", @")"); 

SetTerminal(term);
TryMatch(@"abc\\de\,\)fg,");
Assert.IsNotNull(_token, "Failed to produce a token on valid string.");
Assert.AreEqual(term, _token.Terminal, "Failed to scan a string - invalid Terminal in the returned token.");
Assert.AreEqual(_token.Value.ToString(), @"abc\de,)fg", "Failed to scan a string");  <-------------------------------- Fail

Oct 10, 2011 at 3:56 PM
Edited Oct 10, 2011 at 5:46 PM

Will this be helpful?

        public override Token TryMatch(ParsingContext context, ISourceStream source)
        {
            string tokenText = string.Empty;
            while (true)
            {
                //Find next position
                var newPos = source.Text.IndexOfAny(_stopChars, source.PreviewPosition); 
                if (newPos == -1)
                {
                    if (IsSet(FreeTextOptions.AllowEof))
                    {
                        source.PreviewPosition = source.Text.Length;
                        return source.CreateToken(this.OutputTerminal);
                    }
                    else
                        return null;
                }
                tokenText += source.Text.Substring(source.PreviewPosition, newPos - source.PreviewPosition);
                source.PreviewPosition = newPos;
                //if it is escape, add escaped text and continue search
                if (!CheckEscape(source, ref tokenText))
                    //check terminators
                    if (CheckTerminators(source, ref tokenText))
                        break; //from while (true)
                    else
                        tokenText += source.Text[source.PreviewPosition++];
            }
            return source.CreateToken(this.OutputTerminal, tokenText);
        }

 

To test it, the tests are:

        [TestMethod]
        public void TestFreeTextLiteral()
        {
            //VAR
            //MESSAGE:STRING80;
            //(*_ORError Message*)
            //END_VAR
            var term = new FreeTextLiteral("varContent", "END_VAR");
            SetTerminal(term);
            TryMatch("VAR\r\nMESSAGE:STRING80;\r\n(*_ORError Message*)\r\nEND_VAR");
            Assert.IsNotNull(_token, "Failed to produce a token on valid string.");
            Assert.AreEqual(term, _token.Terminal, "Failed to scan a string - invalid Terminal in the returned token.");
            Assert.AreEqual(_token.Value.ToString(), "VAR\r\nMESSAGE:STRING80;\r\n(*_ORError Message*)\r\n", "Failed to scan a string");

            term = new FreeTextLiteral("blank_test", FreeTextOptions.AllowEof);
            SetTerminal(term);
            TryMatch(string.Empty);
            Assert.IsNotNull(_token, "Failed to produce a token on valid string.");
            Assert.AreEqual(term, _token.Terminal, "Failed to scan a string - invalid Terminal in the returned token.");
            Assert.AreEqual(_token.Value.ToString(), string.Empty, "Failed to scan a string");
        }

Will this work? :)

Coordinator
Oct 11, 2011 at 2:35 PM

yeah, smth like this. Just the method seems becoming a bit bloated and inefficient, and i'm thinking about refactoring it a bit. thanks!

Roman