Simple grammer won't work

Sep 25, 2012 at 4:31 AM

I have the following

RegexBasedTerminal name = new RegexBasedTerminal("name", "[a-zA-Z][a-zA-Z0-9_]*");

var myRule= new NonTerminal("myRule");
myRule.Rule = name + plus + "b" | "a" + minus + "b";
Root = myRule;


I want them to be able to say any name + b or a - b.

 

a - b incorrectly fails to parse as follows:

Stack Top          Input      Action

(S0)                 a (name)  Shift to S2

a (name)          -(key symbol) Syntax Error

a (name)         -(key symbol) RECOVERING: popping stack, looking for state with error shift

(S0)                -(key symbol) FAILED TO RECOVER.

 

I don't understand why it fails to recover.  Also is there a way to make a better look ahead?  I feel like it should have shifted to some intermediary to tell if it was a name or an a...

 

This is the simplest grammar I could create that is expressing the problem I am having, so it's not like I can make the regex simply exclude a and have an extra add for a.

Sep 25, 2012 at 10:51 AM
Edited Sep 25, 2012 at 10:52 AM

Your definition of name includes both a and b as valid possibilities. In essence, it seemes yuo are asking the Scanner to be a Parser as well.

So, the question back to you is why the following grammar does not meet your needs (note the quantifier change in the definition of name.):

public MyGrammar() {
	var name 	= new RegexBasedTerminal("name",	@"[a-zA-Z][a-zA-Z0-9_]+");
	var letter	= new RegexBasedTerminal("letter",	@"[a-bA-B]");

	var MyRule	= new NonTerminal("MyRule");
	var BinOp	= new NonTerminal("BinOp");

	Root = MyRule;

	MyRule.Rule	= name 	+  "+" 	+ letter
			| letter + BinOp	+ letter;
	BinOp.Rule	= ToTerm("+") | "-";
}
Sep 26, 2012 at 1:04 AM

I use name for things other then just this so I would need to change all of them.  Also you name definition won't allow 1 character names?

I have a similar issue with defining a language that has variables and allows math, string, and boolean operations.  I want it to allow the following:

 

mathExpr= any math operation (+ - * / %) using numbers or names (standard expression, term and factor here)

srtingExpr= literalString | literalString + stringExpr | name | name + stringExpr

boolExpression = boolTerm | boolTerm + "or" + boolTerm | boolTerm + "and" boolTerm

boolTerm = name | "true" | "false" | mathExpr + "==" + mathExpr | stringExpr + "==" + stringExpr | boolExpr + "==" + boolExpr

 

I am ok with it choosing a math/string when you say a == a (since it can't know, it would be nice if I could choose preference it would be nice).

 

The issue comes when you need a boolTerm and type

 

x or a

 

it decides that the a is a stringExpr and won't parse.

Sep 26, 2012 at 9:06 AM

Your examples badly confuse the (very distinct) roles of Scanner, Parser, and Semantic Checker. Have you checked out any of the simple grammar examples available on the web and in Irony.Samples?

Sep 26, 2012 at 3:41 PM

I noticed after your reply to the first one that I over simplified it too much and it was not what I wanted.  My issue is with the parser.  The scanner is getting the correct values (in my case not the first one).

 

I am making something that will evaluate expressions.  An expression can either be math, boolean, or string. All can use variables (names). 

 

var name = new RegexBasedTerminal("name", [a-zA-Z][a-zA-Z0-9_]+)

Math rules:

var number = new RegexBasedTerminal("number", "-?[0-9]+");

var mathExpr = new NonTerminal("mathExpr");

var mathTerm = new NoNTerminal("mathTerm");

var mathFactor = new NonTermianl("mathFactor");

mathExpr.Rule = mathTerm | mathExpr + "+" + mathTerm | mathExpr + "-"  + mathTerm

mathTerm.Rule = mathFactor | mathTerm + "*" + mathFacror | mathTerm + "/" + mathFacror | mathTerm + "%" + mathFacror

mathFactor.Rule = number | name | "(" + mathExpr + ")"

 

String rules:

var literal = (I forget but there is a class factory to create C# string literals)

var strExpr = new NonTerminal("strExpr")';

//This is preferred over math if it is name + name, Irony chooses it but I would like to set in the parser

strEexpr.Rule = literal | name | name + strExpr | literal + strExpr

 

Boolean rules:

var test = new NonTerminal("test");

var boolExpr = new NonTermainl("boolExpr");

var boolFactor = new NonTerminal("boolFactor");

var boolTerm = new NonTerminal("boolTerm");

test.rule = boolExpr + "=" + boolExpr | numExpr + "=" + numExpr | strExpr + "=" + numExpr;

boolExpr.Rule = boolExpr + "or" + boolTerm | boolTerm;

boolTerm.Rule = boolTerm + "and" + boolFactor;

//I need a way to say true and false take precedence over name in the scanner

boolFactor.Rule = "true" | "false" | name | "(" + "boolExpr" + ")" | test

 

I tried writing

a or b

 

and it says that == is expected, it parses b as a strExpr even though that won't work and parsing it as a boolExpr would.

Coordinator
Sep 26, 2012 at 4:31 PM

I agree with pg - why don't you start with expression grammar in samples, play with it, see how it works, and then see what you need change?! The main purpose of this sample is to give a working example and a starting point for guys like you. From your explanations I see that you need basic math expression evaluator. Why reinvent the wheel - from scratch?!!!

Roman

Sep 26, 2012 at 5:05 PM

Be gentle Roman. This looks like a first assignment in a compiler course.

;-)

 

Sep 28, 2012 at 2:59 AM

This is actually a part of a much larger language.  The issue that I am having is not expressed in any of the samples (at least not in a basic way).  There is a conflict because math both math and strings can have names (identifiers) with addition.  I would not care if it resolved to either valid one, but the issue is this:

 

a == "a" won't parse.  It decides (prematurely) that a is a math expression and expects math no "a".  If it was a == b then it would be correct to say either (I would like constituency though).  "a" == a works because it knows that it's comparing strings.

 

I don't get why it decides what a is before seeing the "a".  I understand that a scanner should put out name (and that is correct and it does), but the parser should shift and wait for "a" to reduce, but it won't.

Sep 28, 2012 at 9:21 AM
Edited Sep 28, 2012 at 9:24 AM

Type-checking belongs in the Checker, which here happens in the AstNode.Init(..) overloaded method. Define a property Type in the overloaded AstNode for each term, and on the completion of each BinaryOp expression verify that the term types on each side are compatible. Errors can be reported just like regular parser errors as shown below. See (http://irony.codeplex.com/discussions/396545) for an explanation of how I can use a SourceSpan as a SourceLocation, or just use a SourceLocation instead.

context.Messages.Add(new LogMessage(ErrorLevel.Error,Children[Children.Count-1].Span,ex.Message,null));
Sep 28, 2012 at 3:55 PM

Thank you for all your help.  I will try this when I get home, but there is one thing I don't understand.  I tried two slightly differing grammars and one worked while the other did not.

Grammar 1 (works as expected, when something can be a string or math expression it chooses a string because it is the first ion the rule)

var str = TerminalFactory.CreateCSharpStringTermainal("str");

var number = TerminalFactory.CreateCSharpNumber("number");

var name = new RegexBasedTerminal("name", [a-zA-Z][a-zA-Z0-9_]+);

var strExrp = new NonTerminal("streEpr");

strExpr.Rule = name | str | name + strExpr | str + strExpr;

var mathExpr = new NonTerminal("mathEpr");

mathExp.Rule = name | number | name + mathExpr | str + mathExpr;

var rootExpr = new NonTerminal("root");

rootExpr.Rule = strExpr | mathExpr;

Root = rootExpr;

 

Grammar 2 (always assumes that name is a string if it is first in the expression and won't allow a + 1, it expects a name or strExpr)

var str = TerminalFactory.CreateCSharpStringTermainal("str");

var number = TerminalFactory.CreateCSharpNumber("number");

var name = new RegexBasedTerminal("name", [a-zA-Z][a-zA-Z0-9_]+);

var strExrp = new NonTerminal("streEpr");

strExpr.Rule = name | str | name + "+" + strExpr | str + "+" + strExpr;

var mathExpr = new NonTerminal("mathEpr");

mathExp.Rule = name | number | name + "+" + mathExpr | str + "+" + mathExpr;

var rootExpr = new NonTerminal("root");

rootExpr.Rule = strExpr | mathExpr;

Root = rootExpr;

 

The only difference is if I have an add symbol or not and I feel like that should just be one extra shift in the grammar?  Why won't it take a + 1?  Grammar 1 was able to take a 1 or a "a".  It will take 1 + a (it parses the 1 and says oh ok the a is a number).