Grammar not giving correct output. What's wrong?

Oct 12, 2011 at 9:46 AM
Edited Oct 12, 2011 at 10:08 AM

To continue my last thread, here is the grammar written to capture variables:

	var variableName = new FreeTextLiteral("varName", FreeTextOptions.ConsumeTerminator, ":");
	var variableType = new FreeTextLiteral("varType", FreeTextOptions.ConsumeTerminator, ";");

	var varList = new NonTerminal("varList");
	varList.Rule = MakeStarRule(varList, variableName + variableType + NewLinePlus);

	var var_block = new NonTerminal("var_block", "VAR" + NewLinePlus + varList + "END_VAR" + NewLinePlus);

Sample input:

VAR
MESSAGE:STRING80;

END_VAR

 

Basically, the structure of variable declaration is (in C# Regex format):

VAR(\r?\n)+(?<variable_name>\w+):(?<variable_type>\w+);(\r?\n)+END_VAR(\r?\n)

The regex expression gives me correct variable type and name. What is wrong in my Irony grammar?

 

Hint: While debugging, i see that variableName terminal is eating up the NewLinePlus terminal content.

Oct 12, 2011 at 12:13 PM
Edited Oct 12, 2011 at 12:14 PM

The whole input is:

PROGRAM
ABC_DEF
VAR
MESSAGE:STRING80;
END_VAR
TRANSITION
STEP_00
:=
END_TRANSITION
END_PROGRAM

Transition rule is,

            var transitionContent = new FreeTextLiteral("transitionContent", FreeTextOptions.ConsumeTerminator, "END_TRANSITION");
            var transition_block = new NonTerminal("transition_block",
                "TRANSITION" + NewLine
                + transitionContent
                + NewLinePlus);

I see that if I remove the ':=' text, then there is no problem. Seems like the parser doesn't know whether ':' is in between the variable declaration or in some other block (TRANSITION.)

Oct 12, 2011 at 1:05 PM

According to your regular expression variable_name and variable_type do not include whitespace, so my question is, why are you not just using an IdentifierTerminal instead of the FreeTextLiteral?

var variableName = new IdentifierTerminal("varName", IdOptions.AllowsEscapes | IdOptions.CanStartWithEscape); // Not sure what IdOptions you need here
var variableType = new IdentifierTerminal("varType", IdOptions.AllowsEscapes | IdOptions.CanStartWithEscape);

var varList = new NonTerminal("varList");
varList.Rule = MakeStarRule(varList, variableName + ":" + variableType + ";" + NewLinePlus);  
// Does your grammar really require line breaks, or can you put multiple variable declarations on one line? Seems odd to require the semicolon if the line break is required.
var var_block = new NonTerminal("var_block", "VAR" + NewLinePlus + varList + "END_VAR" + NewLinePlus);
Oct 12, 2011 at 1:56 PM

why are you not just using an IdentifierTerminal instead of the FreeTextLiteral?

A valid question! Answer is silly - because I didn't know! I am a beginner regarding Irony implementation. Its been only a week that I have been using it. But i'm learning. :)

// Does your grammar really require line breaks, or can you put multiple variable declarations on one line? Seems odd to require the semicolon if the line break is required

Its odd indeed but I can't change the grammar.

 

MindCore, IdentifierTerminal works! Thanks so much for helping me out!!

 

Regards

Nayan

Oct 12, 2011 at 1:59 PM
Edited Oct 12, 2011 at 2:23 PM

Can you also please point me a direction regarding how to include whitespaces between the variable declaration?

Like, regex pattern - (?<varname>\w+)\s*:\s*(?<vartype>\w+)\s*(\r?\n)

What would be the WS replacement in Irony?

 

Gah! I played a little and found that WS are already ignored in 'NextToken()' function. :)

Oct 12, 2011 at 3:06 PM

I'm glad it worked for you!

Before I am able to answer your next question though, it would help to know your intent.  What I mean is, once you have your parser built do you plan on using the content or is it throw away data?

From your examples on this post and the previous post that Roman responded back on, it appears that everything has a pretty specific purpose.  If this is the case, I believe you need to identify each pattern and stay away from using things like FreeTextTerminal and RegExTerminal if possible. So, let me illustrate.

In your last post you had the following test case scenario:

VAR
MESSAGE:STRING80;
(*_ORError Message*)
END_VAR

How is example of something I would expect your grammar to look.
Note that I have set the grammar to recognize, yet ignore line breaks because they don't really seem relevant (I could be wrong here).


 LineTerminators = "\r\n\u2085\u2028\u2029"; //CR, linefeed, nextLine, LineSeparator, paragraphSeparator

 // White space, formed from spaces (U+0020), carriage returns (U+000D), and newlines (U+000A), 
 // is ignored except as it separates tokens that would otherwise combine into a single token.
 WhitespaceChars = " \r\n"; // declare line terminators

var variableName = new IdentifierTerminal("varName", IdOptions.AllowsEscapes | IdOptions.CanStartWithEscape); 
var variableType = new IdentifierTerminal("varType", IdOptions.AllowsEscapes | IdOptions.CanStartWithEscape);

var var_error_msg = new QuotedValueLiteral("varErrorMsg", "(*_OR", "*)", TypeCode.String); // I can't recall, but this should include whitespace

var var_block = new NonTerminal("varBlock");
var var_list = new NonTerminal("varList");
var var_list_element = new NonTerminal("varListElement");
var var_line = new NonTerminal("varLine");

var_block.Rule = "VAR" + varList + "END_VAR";

var_list.Rule = MakeStarRule(var_list, var_list_element); 

var_list_element.Rule = var_line | var_error_msg;

var_line.Rule = variableName + ":" + variableType + ";";

Let me know if I am leading you in the right direction or if I am totally wrong here.

Best Regards,
Mindcore
 



Oct 12, 2011 at 3:43 PM
Edited Oct 12, 2011 at 3:55 PM

I really appreciate your effort, Mindcore, to go to this extent :)

You're in the right direction, but I won't blame you for minor mistakes as you're not familiar with the DSL I'm working on.
(* and *) are comment markers, like /* and */ in C/C++/C#. The statements are single line statements only. ';' is there to end the line for the variables - required to be parsed, but doesn't make sense.

These rules are working for me (given by you only):

            var variableName = new IdentifierTerminal("varName", IdOptions.AllowsEscapes | IdOptions.CanStartWithEscape | IdOptions.IsNotKeyword);
            var variableType = new IdentifierTerminal("varType", IdOptions.AllowsEscapes | IdOptions.CanStartWithEscape | IdOptions.IsNotKeyword);

            var varList = new NonTerminal("varList");
            varList.Rule = MakeStarRule(varList, variableName + ":" + variableType + ";" + NewLinePlus);

            var var_block = new NonTerminal("var_block", "VAR" + NewLinePlus + varList + "END_VAR" + NewLinePlus);
            var var_alias_block = new NonTerminal("var_alias_block", "VAR_ALIAS" + NewLinePlus + varList + "END_VAR" + NewLinePlus);

I'm ignoring the comment because I recently learnt that the comments won't be there in input.

I should use CommentTerminal instead of QuotedValueLiteral(which makes more sense here in this case, IMO). !!

I need to learn more about LineTerminators and WhitespaceChars (how to use, etc.). Is there any example available anywhere that you know (code/website/forum)?

Oct 12, 2011 at 6:08 PM

First, I would agree that if the (* *) block is a comment, then use the CommentTerminal (if needed of course).

CommentTerminal comment = new CommentTerminal ("Comment", "(*", "*)");

NonGrammarTerminals.Add (comment); // if it's not to be included in your tree, add this

As far as the LineTerminators and WhiteSpaceChars, these are two of the Grammar's properties (like the NonGrammarTerminals above) that have default settings and are used by the underlining logic.  Without looking back through the code, if I recall correctly, both are strings that are casted to an array of characters. When Irony parses what it's provided, it uses LineTerminators when determining the NewLine, NewlinePlus, and NewLineStar terminals, and it uses WhiteSpaceChars to know which characters to completely ignore unless the NonTerminal explicitly says it should be there.

 

The best source for figuring out how things work is to play with the provided samples.  Roman has provided a good set of example Grammars that each have something special.  I honestly started with the GWBasic grammar because I find that language pretty easy and then moved up to the Java and CSharp Grammars which are a bit more complex.

 

Oct 12, 2011 at 6:47 PM

My DSL is easier than GwBasic :)

I'm on a time crunch actually. That's why so many questions. Otherwise, I would love to spend hours on understanding this project via samples. This project is really excellent, I must say. :)

Thank you MindCore for helping me out so much!

 

Regards

Nayan