Ambiguous FreeTextLiteral

Nov 17, 2009 at 4:45 PM

Hello!
I have a problem with scanning of FreeTextLiteral.
That one was detected when input text starts with any character missing in hash table.
How to create Terminal which fits free text starts with any character (including latin characters).
Can anybody help me?

Coordinator
Nov 17, 2009 at 4:52 PM

First, do you use this FreeTextLiteral in any of the rules? if not, you should add it to NonGrammarTerminals.

Otherwise, if it is used somewhere, then Scanner should find it. If it does not define any Firsts prefixes (GetFirsts() return null or empty list), it should be automatically added to FallbackTerminals. Step thru the process and check that this is so (stop when scanner tries to find a terminal and check that your FreeTextTerminal is in the list of FallBackTerminals)

Let me know the results

Roman

Nov 17, 2009 at 7:09 PM

I want to see a free text in the parse tree so I haven't added FreeTextLiteral to NonGrammarTerminals.
And FreeTextTerminal is always in the list of FallBackTerminals.
I have to scan and parse a text like this:

PORTFOLIO_EX Portfolio name;
DESCRIPTION Any description;
FIRMS_LIST ALL_FIRMS;

I wrote the code below for it:

          var PORTFOLIO = new NonTerminal("PORTFOLIO");
          var METADATA = new NonTerminal("METADATA");
          var PORTFOLIO_EX = new NonTerminal("PORTFOLIO_EX");
          var DESCRIPTION = new NonTerminal("DESCRIPTION");
          var FIRMS = new NonTerminal("FIRMS");
          var LINE_END = new NonTerminal("LINE_END");
          var FIRMS_VARIATIONS = new NonTerminal("FIRMS_VARIATIONS");
          var FIRMS_LIST = new NonTerminal("FIRMS_LIST");

          var FREE_TEXT = new FreeTextLiteral("FREE_TEXT", ";", "\n");
          var SEMICOLON = ToTerm(";", "SEMICOLON");
          var COMMA = ToTerm(",", "COMMA");
          var FIRM = new IdentifierTerminal("FIRM");


          this.Root = PORTFOLIO;
          PORTFOLIO.Rule = METADATA;
          METADATA.Rule = PORTFOLIO_EX + DESCRIPTION + FIRMS;
          PORTFOLIO_EX.Rule = "PORTFOLIO_EX" + FREE_TEXT + LINE_END | "PORTFOLIO" + FREE_TEXT + LINE_END;
          DESCRIPTION.Rule = "DESCRIPTION" + FREE_TEXT + LINE_END;
          FIRMS.Rule = "FIRMS_LIST" + FIRMS_VARIATIONS + LINE_END;
          FIRMS_VARIATIONS.Rule = "ALL_FIRMS" | FIRMS_LIST;
          FIRMS_LIST.Rule = MakePlusRule(FIRMS_LIST, COMMA, FIRM);
          LINE_END.Rule = SEMICOLON + NewLine;

And it's not working (

Maybe, are there defects in the rules?

Coordinator
Nov 18, 2009 at 3:40 PM

It would be helpful next time if you provide a sample that you try to parse and also point to the exact place when parser reports an error.

But from what I see my guess would be that the trouble is in PORTFOLIO_EX definition - it should not have LINE_END in both clauses. Semicolon and linebreak are defined as terminators for FREE_TEXT elements and are consumed by this element - in the sense that when parser completes reading FREE_TEXT, the current position of the scanner is at the beginning of the next line, after "\n". But parser still wants to see LINE_END, because it is specified in the rule.

So just try removing LINE_END from PORTFOLIO_EX rule. Let me know if it works. If it's not the problem, then please provide a sample and failure position

Roman

Nov 18, 2009 at 7:15 PM
I've tried different rules with semicolon and linebreak. They didn't give the result I need. But i still one change:

 

          var FREE_TEXT = new FreeTextLiteral("FREE_TEXT", ";"); // "\n" removed

I made 2 screenshots how it works with grammar:

Screenshot when parsing fails

Screenshot when it works well

As you can see I add prefix for token. The prefix is '@'. And parsing works right. When free text starts with any char which can be start of terminal FIRM it doesn't work (
If i add a char to Firsts list of FreeTextLiteral it will work but when text starts with this char. But as I said I want free text can starts with any character (including latin).
It isn't good idea to add all latin chars to Firsts list, is it?
And more about parsing error. I try to parse text:

PORTFOLIO_EX The very long name of portfolio;
DESCRIPTION Any description. You can write much here;
FIRMS_LIST ALL_FIRMS;

Error happens at L,C (0, 13). It highlighted red in the parsing text.
Error message: Syntax error, expected: FREE_TEXT
Parser state: S4

All parser states is below (sorry for the size):

<small> State S0
Shift items:
PORTFOLIO' -> ·PORTFOLIO EOF
PORTFOLIO -> ·METADATA
METADATA -> ·PORTFOLIO_EX DESCRIPTION FIRMS
PORTFOLIO_EX -> ·PORTFOLIO_EX FREE_TEXT LINE_END
PORTFOLIO_EX -> ·PORTFOLIO FREE_TEXT LINE_END
Transitions: PORTFOLIO->S1, METADATA->S2, PORTFOLIO_EX->S3, PORTFOLIO_EX->S4, PORTFOLIO->S5,

State S1
Shift items:
PORTFOLIO' -> PORTFOLIO ·EOF

State S2
Reduce items:
PORTFOLIO -> METADATA · [EOF]

State S3
Shift items:
METADATA -> PORTFOLIO_EX ·DESCRIPTION FIRMS
DESCRIPTION -> ·DESCRIPTION FREE_TEXT LINE_END
Transitions: DESCRIPTION->S7, DESCRIPTION->S8,

State S4
Shift items:
PORTFOLIO_EX -> PORTFOLIO_EX ·FREE_TEXT LINE_END
Transitions: FREE_TEXT->S9,

State S5
Shift items:
PORTFOLIO_EX -> PORTFOLIO ·FREE_TEXT LINE_END
Transitions: FREE_TEXT->S10,

State S6
Reduce items:
PORTFOLIO' -> PORTFOLIO EOF · []

State S7
Shift items:
METADATA -> PORTFOLIO_EX DESCRIPTION ·FIRMS
FIRMS -> ·FIRMS_LIST FIRMS_VARIATIONS LINE_END
Transitions: FIRMS->S11, FIRMS_LIST->S12,

State S8
Shift items:
DESCRIPTION -> DESCRIPTION ·FREE_TEXT LINE_END
Transitions: FREE_TEXT->S13,

State S9
Shift items:
PORTFOLIO_EX -> PORTFOLIO_EX FREE_TEXT ·LINE_END
LINE_END -> ·SEMICOLON LF
Transitions: LINE_END->S14, SEMICOLON->S15,

State S10
Shift items:
PORTFOLIO_EX -> PORTFOLIO FREE_TEXT ·LINE_END
LINE_END -> ·SEMICOLON LF
Transitions: LINE_END->S16, SEMICOLON->S15,

State S11
Reduce items:
METADATA -> PORTFOLIO_EX DESCRIPTION FIRMS · [EOF]

State S12
Shift items:
FIRMS -> FIRMS_LIST ·FIRMS_VARIATIONS LINE_END
FIRMS_VARIATIONS -> ·ALL_FIRMS
FIRMS_VARIATIONS -> ·FIRMS_LIST
FIRMS_LIST -> ·FIRM
FIRMS_LIST -> ·FIRMS_LIST COMMA FIRM
Transitions: FIRMS_VARIATIONS->S17, ALL_FIRMS->S18, FIRMS_LIST->S19, FIRM->S20,

State S13
Shift items:
DESCRIPTION -> DESCRIPTION FREE_TEXT ·LINE_END
LINE_END -> ·SEMICOLON LF
Transitions: LINE_END->S21, SEMICOLON->S15,

State S14
Reduce items:
PORTFOLIO_EX -> PORTFOLIO_EX FREE_TEXT LINE_END · [DESCRIPTION]

State S15
Shift items:
LINE_END -> SEMICOLON ·LF
Transitions: [line break]->S22,

State S16
Reduce items:
PORTFOLIO_EX -> PORTFOLIO FREE_TEXT LINE_END · [DESCRIPTION]

State S17
Shift items:
FIRMS -> FIRMS_LIST FIRMS_VARIATIONS ·LINE_END
LINE_END -> ·SEMICOLON LF
Transitions: LINE_END->S23, SEMICOLON->S15,

State S18
Reduce items:
FIRMS_VARIATIONS -> ALL_FIRMS · [SEMICOLON]

State S19 (Inadequate)
Shift items:
FIRMS_LIST -> FIRMS_LIST ·COMMA FIRM
Reduce items:
FIRMS_VARIATIONS -> FIRMS_LIST · [SEMICOLON]
Transitions: COMMA->S24,

State S20
Reduce items:
FIRMS_LIST -> FIRM · [COMMA SEMICOLON]

State S21
Reduce items:
DESCRIPTION -> DESCRIPTION FREE_TEXT LINE_END · [FIRMS_LIST]

State S22
Reduce items:
LINE_END -> SEMICOLON LF · [EOF DESCRIPTION FIRMS_LIST]

State S23
Reduce items:
FIRMS -> FIRMS_LIST FIRMS_VARIATIONS LINE_END · [EOF]

State S24
Shift items:
FIRMS_LIST -> FIRMS_LIST COMMA ·FIRM
Transitions: FIRM->S25,

State S25
Reduce items:
FIRMS_LIST -> FIRMS_LIST COMMA FIRM · [COMMA SEMICOLON] </small>

Roman, thanks for your desire to help! )

It's possible to scan and parse hard grammars like C# or SQL with Irony.
I'm sure it have to work for my task! What can I try else?

Coordinator
Nov 18, 2009 at 8:41 PM
Edited Nov 18, 2009 at 8:42 PM

I see the problem now. If you look at your first screenshot, in the right-bottom corner, you see token list. The second token produced by scanner is "FIRM" - an identifier token you defined for firms. It is Irony's inconsistency in dealing with terminals without prefixes, long outstanding item. The trouble is Scanner.SelectTerminals method. No-prefix terminals are picked up for consideration ONLY if there are no other candidates with prefixes. In your case there is one candidate - "Firm" identifier, so it leaves out the FREE_TEXT which is in FallbackTerminals. This is wrong, it should be fixed, I simply did not come up yet with consistent and reliable fix for this - this is actually more than a fix, it is sort of refactoring and change of entire scanning algorithm. Until now this bug manifested itself in little inconsistencies but your case is blown up by this. Sorry, will get to this ASAP. For now, add all Latin upper and lower letters to Firsts list of FreeTextLiterals you use - yes, it looks silly, and it would work only for English, but would work until the fix is there.

Sorry again

Roman 

 

Coordinator
Nov 18, 2009 at 8:44 PM

By the way, quite similar issue is reported in latest bug opened in Issue Tracker page. Need to fix it finally

Roman

Nov 19, 2009 at 2:51 AM

I've seen this behavior in debugger, but I've been assured that works fine ) I thought the problem is in my usage of Irony.

Undoubtedly I'll wait for the newest release where it fixed, because it's simplest then doing the same on any *cc toolkit!

Thank you! )

Coordinator
Nov 22, 2009 at 12:38 AM
Edited Nov 23, 2009 at 2:47 AM

This should be fixed in the latest source drop. Here's what your grammar can look like:

  public class SamplePortfolioGrammar : Grammar {
    public SamplePortfolioGrammar() {
      var PORTFOLIO = new NonTerminal("PORTFOLIO");
      var METADATA = new NonTerminal("METADATA");
      var PORTFOLIO_EX = new NonTerminal("PORTFOLIO_EX");
      var DESCRIPTION = new NonTerminal("DESCRIPTION");
      var FIRMS = new NonTerminal("FIRMS");
      var LINE_END = new NonTerminal("LINE_END");
      var FIRMS_VARIATIONS = new NonTerminal("FIRMS_VARIATIONS");
      var FIRMS_LIST = new NonTerminal("FIRMS_LIST");

      var FREE_TEXT = new FreeTextLiteral("FREE_TEXT", "\n");
      var SEMICOLON = ToTerm(";", "SEMICOLON");
      var COMMA = ToTerm(",", "COMMA");
      var FIRM = new IdentifierTerminal("FIRM");


      this.Root = PORTFOLIO;
      PORTFOLIO.Rule = METADATA + NewLineStar; //to allow empty lines after
      METADATA.Rule = PORTFOLIO_EX + DESCRIPTION + FIRMS;
      PORTFOLIO_EX.Rule = "PORTFOLIO_EX" + FREE_TEXT + LINE_END | "PORTFOLIO" + FREE_TEXT + LINE_END;
      DESCRIPTION.Rule = "DESCRIPTION" + FREE_TEXT + LINE_END;
      FIRMS.Rule = "FIRMS_LIST" + FIRMS_VARIATIONS + LINE_END;
      FIRMS_VARIATIONS.Rule = "ALL_FIRMS" | FIRMS_LIST;
      FIRMS_LIST.Rule = MakePlusRule(FIRMS_LIST, COMMA, FIRM);
      LINE_END.Rule = NewLine;

      RegisterPunctuation(SEMICOLON, LINE_END);
      LanguageFlags |= LanguageFlags.NewLineBeforeEOF;
    }
  }

This version does not use any prefixes or require semicolon as terminator. You might also think about removing some superficial things like LINE_END - use NewLine directly
Nov 23, 2009 at 12:37 PM

Wow! It works fine!
But the semicolon is part of the rules. I looked long for why it works not as a former rules ))) You've removed the ';' char from the FreeTextLiteral ))))))))

Presented rules isn't full. There are many other rules for body of program that I remove which don't use semicalon as separator. Well, now I can add all of them! Thank you!

P.S. If tell the truth I have other problem. I think it is in my definition of the grammar and if I can't solve it I'll post )))

Thanks you help me! )))