SQLite Grammar

Nov 8, 2014 at 7:26 AM
I'm trying to create a grammar file for SQLite. I'm following the syntax diagram that they provide. https://www.sqlite.org/syntaxdiagrams.html

I'm getting 57 conflicts. I'll paste the grammar file here so you can run it yourself and see.

I honestly don't understand the errors and have no clue where to start on fixing them. I can click the error to see the Parser States, but I don't understand what it's saying.

There are basically 2 errors: reduce-reduce and shift-reduce. If I can understand what's going on and how to fix one of each, hopefully I can go through the rest by myself.

The grammar file is actually too large to paste here, so I made a gist for it. https://gist.github.com/JoshClose/b1635db1ba55129e0a29
Coordinator
Nov 12, 2014 at 7:16 PM
You have to understand that formal grammar description for a language is a BNF for explaining the language to humans; however, to parse it, it usually should be organized differently. Note that a single language can be described by multiple grammars, differently structured. Different parsing methods (LR, LL, LALR) require different types of grammar, and in most cases the human-readable description grammar is not usable by parse engine.
So you should not expect that direct translation of these diagrams into Irony's grammar class results in workable grammar. The good thing is that Irony gives you detailed description and locations of things it does not like/accept.
Google/bing LR/LALR, parsing conflicts, Reduce/Reduce and Shift/Reduce conflicts in particular, and how to fix them - ie how to reorganize non-terminal rules. There's plenty of info out there. Or, if you're new to this field, start with a good book ("Parsing Techniques" is the best IMO), you have to get a good knowledge of how LARL parsing works before you can use Irony. Sorry, no easy workaround.
Good luck
Roman
Nov 12, 2014 at 7:59 PM
Hi Roman.

I figured there wasn't an easy fix; I just had no idea where to go with this. I will do some more reading and search for the topics you mentioned. I will probably pick up the book too because I know little to nothing about this, but want to learn.

I just need Irony for doing code completion, so I'm hoping I will be able to write a small simple grammar that recognizes a few things. Basically, the completion window would need to pop after a few keywords, such as ".", "from", etc. At that point, I need to know where they are in the syntax tree to determine the completion list to give the user based on the context. i.e. Show table names, column names, functions, etc. I figured since they have a diagram of the whole language, why not create a grammar for the whole thing. I did notice the SQL 89 grammar was a lot smaller than the one I was coming up with. The SQLite grammar can't be too far off of SQL 89.

Thanks again for the direction. I'm sure I'll have more questions in the future.

Josh
Nov 12, 2014 at 9:47 PM
Roman, I just found the BNF for SQLite. I should be able to port this directly to Irony grammar, correct? http://www.sqlite.org/docsrc/doc/trunk/art/syntax/all-bnf.html
Coordinator
Nov 13, 2014 at 1:55 AM
again, you can port it as-is, but it might need to be modified for LALR method (most likely)
Nov 13, 2014 at 6:06 AM
Thanks again. I will read up on the LALR method and parsing in general.
Nov 16, 2014 at 8:24 PM
I read through how LALR works and shift/reduce and reduce/reduce conflicts. I have a better understanding of things, so I will have some specific questions along the way...

The BNF states (slimmed down)
expr    ::= [ [ database-name . ] table-name . ] column-name
expr    ::= <expr> binary-operator <expr>
I interpreted it as this (slimmed down):
var OR = ToTerm( "||" );

var idSimple = TerminalFactory.CreateSqlExtIdentifier( this, "idSimple" );
var expr = new NonTerminal( "expr" );
var binaryOperator = new NonTerminal( "binaryOperator" );

Root = expr;

binaryOperator.Rule = OR;
expr.Rule = idSimple
            | expr + binaryOperator + expr;
I get a shift/reduce error here. Shift-reduce conflict. State S6, lookaheads [||]. Selected shift as preferred action.
State S6 (Inadequate)
  Shift-reduce conflicts on inputs: ||
  Shift items:
    expr -> expr ·binaryOperator expr 
    binaryOperator -> ·|| 
  Reduce items:
    expr -> expr binaryOperator expr · [EOF ||]
  Transitions: binaryOperator->S4, ||->S5
I can solve this by adding ReduceHere().
var OR = ToTerm( "||" );

var idSimple = TerminalFactory.CreateSqlExtIdentifier( this, "idSimple" );
var expr = new NonTerminal( "expr" );
var binaryOperator = new NonTerminal( "binaryOperator" );

Root = expr;

binaryOperator.Rule = OR;
expr.Rule = idSimple
            | expr + binaryOperator + expr + ReduceHere();
My question is, is this the right thing to do here? I'm trying to further my understanding of this.

My thought is when it sees the binary operator || on a look ahead, it doesn't know if it should shift, meaning it sees it as a || or if it should reduce, meaning it sees the beginning of another expression. Is that correct thinking?

When putting the ReduceHere() at the end, what is that saying?
Coordinator
Nov 20, 2014 at 7:03 AM
Did you set operator precedence for binary operators? (and associativity) Looks like you didn't, that the cause of the conflict. Using ReduceHere hint is a wrong thing to do, definitely
Nov 22, 2014 at 9:05 PM
Thanks. Setting the preference worked, along with a couple other things. I'm using the supplied SqlGrammar to help.

I'm adding in COLLATE, which is a postfix unary operator.
var dot = ToTerm( "." );
var IS = ToTerm( "IS" );
var NOT = ToTerm( "NOT" );
var IN = ToTerm( "IN" );
var LIKE = ToTerm( "LIKE" );
var GLOB = ToTerm( "GLOB" );
var MATCH = ToTerm( "MATCH" );
var REGEXP = ToTerm( "REGEXP" );
var AND = ToTerm( "AND" );
var OR = ToTerm( "OR" );
var COLLATE = ToTerm( "COLLATE" );

var idSimple = TerminalFactory.CreateSqlExtIdentifier( this, "idSimple" );
var id = new NonTerminal( "id" );
var expr = new NonTerminal( "expr" );
var binaryOperator = new NonTerminal( "binaryOperator" );
var unaryOperator = new NonTerminal( "unaryOperator" );
var binaryExpression = new NonTerminal( "binaryExpression" );
var unaryExpression = new NonTerminal( "unaryExpression" );
var collateExpression = new NonTerminal( "collateExpression" );

Root = expr;

id.Rule = MakePlusRule( id, dot, idSimple );
unaryOperator.Rule = ToTerm( "-" ) | "+" | "~" | "NOT";
binaryOperator.Rule = ToTerm( "||" ) | "*" | "/" | "%" | "+" | "-" | "<<" | ">>" | "&" | "|" | "<" | "<=" | ">" | ">=" | "=" | "==" | "!=" | "<>"
    | IS | /*IS + NOT |*/ IN | LIKE | GLOB | MATCH | REGEXP | AND | OR;

unaryExpression.Rule = unaryOperator + expr;

binaryExpression.Rule = expr + binaryOperator + expr;

collateExpression.Rule = expr + COLLATE + idSimple;

expr.Rule = id
            | unaryExpression
            | binaryExpression
            | collateExpression
    ;

RegisterOperators( 10, "||" );
RegisterOperators( 9, "*", "/", "%" );
RegisterOperators( 8, "+", "-" );
RegisterOperators( 7, "<<", ">>", "&", "|" );
RegisterOperators( 6, "<", "<=", ">", ">=" );
RegisterOperators( 5, "=", "==", "!=", "<>" );
RegisterOperators( 5, IS, /*IS + NOT,*/ IN, LIKE, GLOB, MATCH, REGEXP );
RegisterOperators( 4, "AND" );
RegisterOperators( 3, "OR" );

MarkTransient( unaryOperator );
binaryOperator.SetFlag( TermFlags.InheritPrecedence );
I then get a conflict Shift-reduce conflict. State S42, lookaheads [COLLATE]. Selected shift as preferred action.

I solved this by putting a PreferShiftHere().
collateExpression.Rule = expr + PreferShiftHere() + COLLATE + idSimple;
This seems like what I would want to do. Is that correct? To me it looks like if it's at expr and does a look ahead and sees COLLATE that is should prefer doing a shift at this point.

I'm going to keep all my questions in this thread since they're all about the same grammar, if that's ok with you.
Nov 24, 2014 at 4:37 PM
I started over from scratch again (3rd time). I took a different approach this time. I'm going to work on the parts I want working first, and little by little add things in. Every step of the way I am running examples in the tool and looking at the tree too.

I learned something while doing things this way. I now understand what you mean by the BNF needs to be changed to work with LALR. There are several places I could just omit rules because they were already covered by another rule. Restating them in the BNF makes it more clear for a human when reading, but doesn't make sense in the grammar. I have found that if I run into a reduce/reduce error, I more than likely screwed something up majorly and probably have some duplication.

It's really hard to describe what all that means without doing it yourself, or going through a fairly large example. I haven't found anything on the web that has done a good job of this, unfortunately.

Things are looking up. I completed with "select-core" fully with no conflicts, and am able to run a bunch of tests and it seems to parse as expected.

This is all assuming my previous question (affirmation) is correct.

Thanks again for the help.
Coordinator
Nov 24, 2014 at 7:18 PM
well, happy for you, good luck in your quest.
Answering your question, about use of PreferShiftHere. Be very careful with this 'fixer' for conflicts. I can think about only ONE legitimate case for using it - 'dangling else' case. This hint is a way to tell parser that the conflict it sees in grammar is in fact language ambiguity, and there is an extra RULE in the language stating that 'else' always applies to the closest open 'if' statement
(like 'if ... then if ... then ... else...' - the 'else' applies to second if; this is ambiguity of the language itself (!) resolved by an extra rule)

And PreferShiftHere is an instruction expressing this non-BNF rule in the language.
You see what I mean? Use PreferShiftHere to express some clear non-BNF rule 'of thumb' in the language that is hard or impossible to express in BNF. But do NOT use it to fix grammar ambiguities - that most likely are result of bad BNF composition, and not essential ambiguity in the language

Roman
Nov 24, 2014 at 7:44 PM
Great, thanks for the info! I will have to dig into the 'dangling else' case a little more so I can understand it better, and see if I'm running into that. For the most part, a shift/reduce conflict has shown me there is an issue with the grammar, and more than likely, the scenario is already accounted for.

I currently have PreferShiftHere() in 2 spots. I believe one is hard to express, and the other is a grammar ambiguity. I will have work through it and see if it's grammar or language ambiguity, and if I can remove it.

I updated the gist with the current state, if you want to take a look. I'm sure you'll be able to tell right away if either of the uses are appropriate.

https://gist.github.com/JoshClose/b1635db1ba55129e0a29
Nov 29, 2014 at 6:08 AM
I was able to remove that second PreferShiftHere() and it didn't break the rule. Again, it pointed out duplicate rules. :)