Repeating texts in address parsing

Aug 24, 2010 at 9:32 PM

First of all, kudos to the creative break through! I was able to play with this and able to create a rudimentary address parser!

Yet, I have a simple question being a novice in this tool.

I am a writing an address parser that scans addresses...

1. We may have addresses "123 ABC street" and "123 Martin Luther King Street"

2. Mine works with first case and I am having a hard time for the second..

Here is my simple rule:

// street address rule
          street_addr.Rule = house_nbr + street | street;
          street.Rule = street_name + hPOSTDIR_E1 |
                        street_name + hPOSTDIR_E2 |
                        street_name + hPOSTDIR_W1 |
                        street_name + hPOSTDIR_W2 |
                        street_name + hPOSTDIR_S1 |
                        street_name + hPOSTDIR_S2 |
                        street_name + hPOSTDIR_N1 |
                        street_name + hPOSTDIR_N2 |
                        street_name + hPOSTDIR_NW |
                        street_name + hPOSTDIR_NE |
                        street_name + hPOSTDIR_SW |
                        street_name + hPOSTDIR_SE |
          street_name.Rule = road_literal + myNumb | //road_literal can be ROAD, RD, AVE etc etc
                             myNumb + road_literal |
                             myStreet + road_literal;

In the above myStreet is a freetext literal.

How will I make myStreet literal occur many times separated by space and delimited by road literal..

This works ok for "123 abc street".

Sorry if this is too elimentary..I am looking for some guidance..

Thanks in advance from Venkat

Aug 25, 2010 at 10:35 PM
Edited Aug 25, 2010 at 10:36 PM

First of all I'd recommend simplifying the grammar; for ex, you can define a nonterminal hPostDir and merge all these permutations into it, so definition of "street" is much simpler.

Overall, I don't think what you ask for is possible, or easily doable. FreeTextLiteral would swallow anything it sees, including road literal. You can try to increase the priority of road literal (or lowering another one: myStreet.Priority = -10). You should try to build inambiguous grammar, so that when you manually (mentally) map some address to your grammar, you see one, single way of mapping it to grammar expressions.

Anyway, why you try to split street name? As far as I understand, "Main St SE" is one thing, that's the street name, why split it more? Same goes for Rd, Pl prefixes suffixes, etc - they are all integral part of unique street name.


Aug 25, 2010 at 10:47 PM

Thank you Roman. That was pretty quick. I will try your suggestions later in the day or tomorrow.

regarding why the street names could be what it is :

We can have street names like " martin Luther King drive" etc..Multi-word string forming street names..

I had a grammar similar to it in yacc & lex...but I am more interested in seeing it all in .NET and C#..

Will keep you updated on my progress.

Aug 26, 2010 at 8:14 PM

I was just starting to write this..Hence those improvements suggested by you will definitely help as things get solidified.Thanks anyways!

When I tried with myStreetList, myStreet_part as non literals, with the following instruction, it seems it is working!

myStreetList.Rule = MakeStarRule(myStreetList,  myStreet);

BTW: What is the difference between MakeStarRule & MakePlusRule?

Aug 26, 2010 at 10:52 PM

Star and Plus are Kleene operators, Star is "zero or more", plus is "one or more"