This project has moved and is read-only. For the latest updates, please go here.

Creating a parser for the Lucene query syntax

Apr 29, 2014 at 1:41 PM
I'm attempting to use Irony to make a parser for the Lucene Query Syntax (see links at the end for more info). The current version of my grammar is here

As a starting point, I'm trying to make it make this BNF definition:
Query -> Clause (And Clause | Or Clause | NotClause | Clause)*;
NotClause -> Not Clause;
Clause -> (SubClause | Term);
SubClause -> (PLUS Query) | (MINUS Query) | (OPEN_PAREN Query CLOSE_PAREN);

Term -> Range | QualifiedTerm | UnqualifiedTerm;
QualifiedTerm -> FIELD_NAME ( Range | TEXT_VALUE | STRING_LITERAL );
Range -> OPEN_SQUARE UnqualifiedTerm TO UnqualifiedTerm CLOSE_SQUARE;
UnqualifiedTerm -> (STRING_LITERAL | TEXT_VALUE);
But I'm struggling with the definition of the top-level query, I'm not sure how to properly make (optional) BinaryExpressions, with an implied OR operator. Can anyone give me any pointers?

These are the queries I'm currently testing with:
  • title:(+return +"pink panther")
  • "jakarta apache" -"Apache Lucene"
  • (jakarta OR apache) AND website
  • jakarta apache website matt name:bob age:123
  • +(Term:bar Term2:baz) +Term3:foo -Term4:rob
  • mod_date:[20020101 TO 20030101] Name:bob
  • title:(Do AND it)
  • title:"Do it right" AND (right:go OR matt:no)
There is some info on the grammar in these links:
Apr 29, 2014 at 5:29 PM
Look at FullTextSearch grammar sample, your grammar seems similar to FTS/Google search language, start from this grammar