Terminal to get everything except X,Y, N...

Aug 18, 2011 at 12:05 AM

I'm trying to build a parser for a DSL I'm building and found problem I can't find a way around.

I have the following input:

yesterday at drug store 

With the following grammar:

var expr = new NonTerminal("expr");
var location = new NonTerminal("location");

var date = new RegexBasedTerminal("date", @"today|tomorrow|yesterday");
var preposition = new RegexBasedTerminal("preposition", @"at");
var locationToken = new RegexBasedTerminal("locationToken", @".*");

preposition.Priority = Terminal.LowestPriority;
locationToken.Priority = Terminal.LowestPriority;
date.Priority = Terminal.HighestPriority;

expr.Rule =
	date + location
	| location + date;

location.Rule =
	preposition + locationToken
	| locationToken;          

this.MarkPunctuation(preposition);

this.Root = expr;

Which works like a charm giving me a Location and a Date. However if I change the input to 

at drug store yesterday

It won't work because the locationToken regex is too broad. I thought that Priority could solve my problem but It didn't. Is there a way to do this? Or in a more broadly phrased question: is there a way to create a Terminal that gets everything that couldn't be tokenized?

Thanks a lot!

Coordinator
Aug 18, 2011 at 6:46 AM
Edited Aug 18, 2011 at 6:46 AM

Terminal that matches all that couldn't be tokenized - no, there's no such terminal in Irony, yet. It is one of the suggested problems to work on (BackgroundTextTerminal, P11 : [http://irony.codeplex.com/wikipage?title=ContribProjects])

Don't know match about regex, so just an idea - can you change locationToken to match one word only, like match any char except space?

Aug 18, 2011 at 1:47 PM

Thanks for the quick answer :)

I need the locationToken to capture everything, since the location in question can be composed of multiple words so just a single word won't do. For now I think I will use a delimiter character (like comma for example) or use a StringLiteral to work around it.

I will try to patch it to implement P11 even though I´m not really good on parsers. I will let u know if I succeed.

Thanks again! :)

Aug 18, 2011 at 3:15 PM
Edited Aug 18, 2011 at 4:52 PM

Hi!

I think I was able to implement P11. Probably there are some problems with it (I can think of some situations) but I think I can share as a rough draw here so you (and others) can take a look and comment at it.

I am still going to test it further and when done will submit a patch for the project.

Here is the code:

public class BackgroundTextTerminal : Terminal
    {
        public BackgroundTextTerminal(string name) : base(name) { }

        private List<Terminal> OtherTerminals { get; set; }

        public override void Init(GrammarData grammarData)
        {
            OtherTerminals = grammarData.AllTerms.Where(x => x.GetType().IsSubclassOf(typeof(Terminal)) && x.Name != this.Name).Select(x => (Terminal)x).ToList();

            base.Init(grammarData);
        }

        public override Token TryMatch(ParsingContext context, ISourceStream source)
        {
            var text = String.Empty;
            var otherMatch = OtherTerminals.Any(t => t.TryMatch(context, source) != null);
            var initialPosition = source.PreviewPosition;

            while (!otherMatch)
            {
                text += source.Text.Substring(source.PreviewPosition, 1);

                if (source.PreviewPosition == (source.Text.Length - 1))
                    break;

                source.PreviewPosition++;

                otherMatch = OtherTerminals.Any(t => t.TryMatch(context, source) != null);
            }

            if (String.IsNullOrWhiteSpace(text))
            {
                return null;
            }
            else
            {
                source.PreviewPosition = initialPosition + text.Length;
                return source.CreateToken(this.OutputTerminal);
            }
        }
    }

Coordinator
Aug 18, 2011 at 5:23 PM

 

Looks good, but... That's not what I had in mind - but you're on the right track!

What your code does it stops at every char and tries to match ALL other terminals - making a full TryMatch call. This is very inefficient.
What you should do instead is: get a full list of all Firsts symbols from all other terminals (using Terminal.GetFirsts() method);

then in Match method run through this list and make a search for the first occurrence of any of these First prefixes.

(There might be some extra optimizations, like you group terminals by first char and first for this first char first, then try to match all prefixes in the group at this position P.) 
When finally you match the prefix at position Px, you call otherTerminal.TryMatch to try full match and create a token; if a non-null token OToken is returned, you don't throw it away;

you first create a Background token Btoken with all text before position Px, and then return multitoken with (BToken, OToken) inside.