Failure to Recognise String

Sep 3, 2013 at 7:43 PM
I'm trying to parse a grammar with the following (simplified) BNF rules:

ConstantExpression = number | Date | QuotedString
Date = Year/Month/Day | Year/Month/Day Hour:Minute:Second
QuotedString = 'string'
Year etc = number

Here is my code:
 public TestGrammar()
  {
    
    var number = new NumberLiteral( "number" );    
    var str = new RegexBasedTerminal( "str", "[a-zA-Z0-9 ]*" );

    var ConstantExpression  = new NonTerminal( "ConstantExpression" );   
    var QuotedString        = new NonTerminal( "QuotedString" );
    var Year                = new NonTerminal( "Year" );
    var Month               = new NonTerminal( "Month" );
    var Day                 = new NonTerminal( "Day" );
    var Hour                = new NonTerminal( "Hour" );
    var Minute              = new NonTerminal( "Minute" );
    var Second              = new NonTerminal( "Second" );
    var Date                = new NonTerminal( "Date" );   
        
    QuotedString.Rule = ToTerm( "'" ) + str + ToTerm( "'" ); 

    Year.Rule   = number;    
    Month.Rule  = number;    
    Day.Rule    = number;    
    Hour.Rule   = number;    
    Minute.Rule = number;    
    Second.Rule = number;    

    Date.Rule = ToTerm( "'" ) + Year + ToTerm( "/" ) + Month + ToTerm( "/" ) + Day + ToTerm( "'" )
              | ToTerm( "'" ) + Year + ToTerm( "/" ) + Month + ToTerm( "/" ) + Day + ToTerm( " " ) 
                + Hour + ToTerm( ":" ) + Minute + ToTerm( ":" ) + Second + ToTerm( "'" );
    
    ConstantExpression.Rule = number | Date | QuotedString;   

    this.Root = ConstantExpression;
          
  }
This works fine for numbers and QuotedStrings, but fails for dates e.g.
'12/12/12': Syntax error, expected: '

Shouldn't the parser backtrack and attempt to recognise the text using Date if QuotedString fails?

Many thanks,

James
Coordinator
Sep 3, 2013 at 9:18 PM
get rid of Year,Month, Day.. etc terminals, use 'number' directly
Sep 4, 2013 at 8:20 AM
Thanks for the suggestion, but that doesn't seem to be helping:
public TestGrammar()
  {
    
    var number  = new NumberLiteral( "number" );    
    var str     = new RegexBasedTerminal( "str", "[a-zA-Z0-9 ]*" );

    var ConstantExpression  = new NonTerminal( "ConstantExpression" );   
    var QuotedString        = new NonTerminal( "QuotedString" );    
    var Date                = new NonTerminal( "Date" );   
        
    QuotedString.Rule = ToTerm( "'" ) + str + ToTerm( "'" );     

    Date.Rule = ToTerm( "'" ) + number + ToTerm( "/" ) + number + ToTerm( "/" ) + number + ToTerm( "'" )
              | ToTerm( "'" ) + number + ToTerm( "/" ) + number + ToTerm( "/" ) + number + ToTerm( " " ) 
                + number + ToTerm( ":" ) + number + ToTerm( ":" ) + number + ToTerm( "'" );
    
    ConstantExpression.Rule = number | Date | QuotedString;   

    this.Root = ConstantExpression;
          
  }
'12/12/12': Syntax error, expected: '

Any other ideas? BTW, very much liking Irony so far - many thanks!
Coordinator
Sep 4, 2013 at 9:16 AM
Define a separate non-terminal for Time, and define DateExt nonterminal as 'Date + TimeOpt,
where TimeOpt.Rule = Time | Empty;
On a general note, I suggest to treat dates and strings as quoted strings. The content of the string is checked in ValidateToken event for the strings. Your code should analyze the content and if it is date, replace the token.Terminal value with Date terminal.
Roman
Sep 5, 2013 at 3:25 PM
Thanks for the alternative suggestion. For the benefit of others, here is what I am now using:
public class TestGrammar : Irony.Parsing.Grammar 
{

  private Dictionary< string, Terminal > m_TerminalOverrides = new Dictionary< string, Terminal >();

  public TestGrammar()
  {
    var datetime            = new Terminal( "datetime" );
    var str                 = new Terminal( "str" );
    var number              = new NumberLiteral( "number" );    
    var quotedStringOrDate  = new RegexBasedTerminal( "quotedString", @"'[\w /\:\.\+\-]*'" );

    m_TerminalOverrides[ datetime.Name ]  = datetime;
    m_TerminalOverrides[ str.Name ]       = str;

    quotedStringOrDate.ValidateToken += quotedStringOrDate_ValidateToken;

    var ConstantExpression  = new NonTerminal( "ConstantExpression" );                

    ConstantExpression.Rule = number 
                            | datetime 
                            | str
                            | quotedStringOrDate;   

    this.Root = ConstantExpression;
     
  }

  void quotedStringOrDate_ValidateToken( object sender, ValidateTokenEventArgs e )
  {
    var current = e.Context.CurrentToken;
    var dateRegex = new Regex( @"'\d{4}/\d{2}/\d{2}'|'\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}(\.\d+)*'" );

    if( dateRegex.Match( current.Text ).Length > 0 )
    {
      // If regex matches, we have a date
      e.ReplaceToken( new Token( m_TerminalOverrides[ "datetime" ], current.Location, current.Text, current.Value ) );
    }
    else
    {
      // Else it must be just a plain string
      e.ReplaceToken( new Token( m_TerminalOverrides[ "str" ], current.Location, current.Text, current.Value ) );
    }
  }
}
Coordinator
Sep 5, 2013 at 5:35 PM
one suggestion - move 'dateRegex' to field and do new Regex in constructor - AFAIK regex does compilation of pattern first time it is used, so to avoid repeated overhead do it once
Roman