Bug in line/column counters

Developer
Feb 6, 2010 at 8:44 PM
Edited Feb 6, 2010 at 8:52 PM

Hi, Roman!

I found a bug in a line/column counters which occurs when the grammar uses FreeTextLiteral with ConsumeTerminator option set. Consider the following sample program:

using System;
using Irony.Ast;
using Irony.Parsing;

class SampleGrammar : Grammar
{
  public SampleGrammar(bool consumeTerminator) : base(true)
  {
    var FreeText = new FreeTextLiteral("FreeText");
    FreeText.Terminators.AddRange("\r", "\n");
    FreeText.FreeTextOptions |= FreeTextOptions.AllowEof;

    if (consumeTerminator)
      FreeText.FreeTextOptions |= FreeTextOptions.ConsumeTerminator;

    var Program = new NonTerminal("Program");
    var Statement = new NonTerminal("Statement");
    Program.Rule = MakePlusRule(Program, Statement);
    Statement.Rule = "A" + FreeText;
    Root = Program;
  }

  static void RunSample(string title, SampleGrammar grammar, string text)
  {
    Console.WriteLine("Test case: {0}", title);

    var parser = new Parser(grammar);
    var tree = parser.Parse(text);

    if (tree.HasErrors())
      foreach (var pm in tree.ParserMessages)
        Console.WriteLine("at {0}: {1}", pm.Location, pm.Message);

    Console.WriteLine();
  }

  static void Main()
  {
    var sampleText = "A Hard Day's Night\r\n\tI Am the Walrus";

    RunSample(@"NewLine is \r\n, ConsumeTerminator is not set", new SampleGrammar(false), sampleText);
    RunSample(@"NewLine is \r\n, ConsumeTerminator is set", new SampleGrammar(true), sampleText);

    sampleText = "A Hard Day's Night\n\tI Am the Walrus";
    
    RunSample(@"NewLine is \n, ConsumeTerminator is not set", new SampleGrammar(false), sampleText);
    RunSample(@"NewLine is \n, ConsumeTerminator is set", new SampleGrammar(true), sampleText);
  }
}

It prints the following:

Test case: NewLine is \r\n, ConsumeTerminator is not set
at (1:8): Syntax error, expected: A

Test case: NewLine is \r\n, ConsumeTerminator is set
at (2:8): Syntax error, expected: A

Test case: NewLine is \n, ConsumeTerminator is not set
at (1:8): Syntax error, expected: A

Test case: NewLine is \n, ConsumeTerminator is set
at (1:1): Syntax error, expected: A

Seems that with CR+LF line terminators line counter gets incremented twice on every FreeTextLiteral. For example, when input text is

var sampleText = @"A Hard Day's Night
  A Taste of Honey                  
  A Day in a Life                   
  I Am the Walrus"; // error at line 3

it shows error at (6:2) instead of (3:2). I'm currently investigating this case (seems that problem occurs in SourceStream::MoveLocationToPreviewPosition() when PreviewPosition occurs between CR and LF characters, so nlCount is incremented for both CR and LF).

Another issue is concerning column counter. Haven't looked into it yet.

Developer
Feb 6, 2010 at 9:56 PM

Line counter bug can be fixed as follows:

public void MoveLocationToPreviewPosition() {
  if (_location.Position == PreviewPosition) return; 
  if (PreviewPosition > Text.Length)
    PreviewPosition = Text.Length;
  // First, check if preview position is in the same line; if so, just adjust column and return
  //  Note that this case is not line start, so we do not need to check tab chars (and adjust column) 
  if (PreviewPosition <= _nextNewLinePosition || _nextNewLinePosition < 0) {
    _location.Column += PreviewPosition - _location.Position;
    _location.Position = PreviewPosition;
    return;
  }
  //So new position is on new line (beyond _nextNewLinePosition)
  //First count \n chars in the string fragment
  int lineStart = _nextNewLinePosition;
  int nlCount = 1; //we start after old _nextNewLinePosition, so we count one NewLine char
  CountCharsInText(Text, _scannerData.LineTerminatorsArray, lineStart + 1, PreviewPosition - 1, ref nlCount, ref lineStart);
  _location.Line += nlCount;
  //at this moment lineStart is at start of line where newPosition is located 
  //Calc # of tab chars from lineStart to newPosition to adjust column#
  int tabCount = 0;
  int dummy = 0;
  if (TabWidth > 1)
    CountCharsInText(Text, _tab_arr, lineStart, PreviewPosition - 1, ref tabCount, ref dummy);

  //adjust TokenStart with calculated information
  _location.Position = PreviewPosition;
  _location.Column = PreviewPosition - lineStart - 1;
  if (tabCount > 0)
    _location.Column += (TabWidth - 1) * tabCount; // "-1" to count for tab char itself

  // handle special case: PreviewPosition points at LF following a CR
  var startFrom = PreviewPosition;
  if (PreviewPosition > 0 && Text.Length > PreviewPosition && Text[PreviewPosition - 1] == '\r' && Text[PreviewPosition] == '\n')
    startFrom++;

  //finally cache new line and assign TokenStart
  _nextNewLinePosition = Text.IndexOfAny(_scannerData.LineTerminatorsArray, PreviewPosition startFrom);
}

Looks a bit ugly, though. But sadly, this case of CR+LF is not handled by CountCharsInText() because lineStart + 1 > PreviewPosition - 1.

Coordinator
Feb 8, 2010 at 6:27 PM

thanks, I will look into this

Feb 28, 2010 at 3:16 AM

why don't you declare \r\n as an end terminal instead of just \r?, if you do then you should get 3:2 for the error location.

 

FreeText.Terminators.AddRange("\r\n", "\n");

 

Developer
Feb 28, 2010 at 10:52 AM

"\r" is a standard line terminator in MacOS. The scanner should accept all kinds of EOL symbols.

I can use "\r\n" as a workaround, but it's better to fix the bug in the scanner.

Mar 1, 2010 at 12:29 AM

The column count is wrong in the 4th case because it was part of whitespace that was skipped between tokens. It looks like skipped whitespace which doesnt begin with a newline doesn't expand tabs when setting the column.

Developer
Mar 2, 2010 at 1:08 AM
Edited Mar 2, 2010 at 12:11 PM

Yes, you're right. Here is related comment in MoveLocationToPreviewPosition() method:

//  Note that this case is not line start, so we do not need to check tab chars (and adjust column)

I don't think it's correct. Instead, columns should be counted as follows (assuming TabWidth == 8):

0       8       16      24
-------»int someVar =--»3;
-------»string s =-----»"hello";

Tab positions should be aligned to TabWidth.