This project has moved and is read-only. For the latest updates, please go here.

Non-structured grammar

Feb 22, 2011 at 6:37 PM

I'm trying to write a grammar that will accept unknown tokens in multiple places, to construct db searches from spoken queries.

The basic structure of a query would be:

[ColorName] [ItemName]

  [in [CategoryName] ]

  [with [AttributeName] ]

  [and [AttributeName] ] . . .

ColorName comes from a list of known colors, but ItemName, CategoryName and AttributeName are all free-form phrases.

Is there a way in Irony to define tokens consisting of one or more unknown words, using the known words "in, with, and" as separators?

With my initial attempt at a grammar I can only get it to work with single-word unknowns.

Feb 22, 2011 at 9:31 PM

I assume that [CategoryName] phrase does not actually have brackets around this. 

I think the best way to go is to represent CategoryName and other names as non-terminal which is a list of words (identifiers); you should also mark "in", "with", "and" as reserved words.

So it will be like this:

var word = new IdentifierTerminal("word"); 


var compoundName = new NonTermina("compoundName);

compoundName.Rule = MakePlusRule(compoundName, word); 

colorName.Rule = ToTerm("red") | "blue" | "green";

colorSpec.Rule = colorName + compoundName + "in" + compoundName + "with" + compoundName + "and" + compoundName; 

MarkReservedWords("in", "with", "and");


If you have with-clause optional, you need to create a separate terminal for it: 

withClause.Rule = "with" + compoundName | Empty; 

and use it in expression for colorSpec. Same with other clauses

Hope this helps

Feb 22, 2011 at 11:56 PM

Thank you! Yes, the brackets just indicate optional items.

I was doing some more work on this today and came up with this:

var search = new NonTerminal("search");
var conditionList = new NonTerminal("conditionList");
	var condition = new NonTerminal("condition");
	var nameCondition = new NonTerminal("nameCondition");
	var categoryCondition = new NonTerminal("categoryCondition");
	var keywordCondition = new NonTerminal("keywordCondition");

	var term = CreateTerm("term");
	var color = new NonTerminal("color");
	var name = new NonTerminal("name");
	var keyword = new NonTerminal("keyword");

	MakePlusRule(name, term);

	search.Rule = conditionList;
	conditionList.Rule = MakePlusRule(conditionList, null, condition);
	condition.Rule = color | nameCondition | categoryCondition | keywordCondition;

	color.Rule = (ToTerm("red") | "green" | "blue" | "white" | "black" | "colorless");
	nameCondition.Rule = name;
	categoryCondition.Rule = (ToTerm("from") | "in") + name;
	keywordCondition.Rule = (ToTerm("with") | "having" | "that" + (ToTerm("has") | "have" | "can" | "are" )) + name;

	this.Root = search;

This code works but produces leaf nodes that are individual terms that must be concatenated after parsing. For example, if the input is "students in main enrollment with special needs" -- where "main enrollment" is the name of a category and "special needs" is a keyword, right now my grammar generates a categoryCondition node with 2 term nodes "main" and "enrollment", and a keywordCondition node with 2 term nodes "special" and "needs". In both cases the term nodes must be concatenated by post-processing code before they can be used. I am trying to make the parser generate a categoryCondition node with value "main enrollment" and a keywordCondition with value "special needs". But I'm not sure this is possible if categoryCondition and keywordCondition are composed of smaller terms.

I will try the approach you mentioned above, and I greatly appreciate your advice!

Feb 23, 2011 at 1:14 AM

Well, in my solution it is the same - you'll get a list of identifier nodes that must be concatenated in post-processing. I don't think it is actually a big deal, and any alternative would be any better or easier - at least, it wouldn't be easier to "explain to parser" that we need concatenated stuff :)