How to make a string from undelimited text.

Dec 24, 2008 at 11:19 AM
I'm trying to write a parse for the PILOT programming language.  It's doesn't use delimited string.  For example, for what other languages might write as:

print "Hello, " + Name + "!"

In PILOT would be :

T: Hello, $Name !

Essentially, a string is everything that not something else.   How do I represent that in Irony?

(I've also planned on a creating parsers for Markdown and Velocity, so this is the third time this issue has come up)
Coordinator
Dec 26, 2008 at 7:03 PM
Edited Dec 26, 2008 at 7:04 PM
You've got quite a language there man. I have no immediate answer. The reason is that scanning in Pilot cannot be clearly separated from parsing. Like some other languages dating back to 1960 and before,  (FORTRAN is another example), it was created without regard to scanner/parser separation, which evolved as compiler architecture later and became standard for later languages. Irony's architecture (like Lex/Yacc and other tools) is based on this separation, so making it work for Pilot would not be straightforward.
If this is just a test project to try Irony, I would recommend looking at some other language as a target. If this is more serious, let's see if we can do something about it. One thing that would help in this case is to introduce back link from parser to scanner: when scanner selects a list of candidate Terminals to scan the token, it may ask parser to filter it based on current expecting tokens in parser state. I find that this scanner/parser link may be benefitial in other situations, making it easier to write non-conflicting grammars for traditional languages. It is on my to-do list, although not very high in it - there's lot of items before it. Let me know how urgent is this for you.
Dec 29, 2008 at 1:38 PM
Well, I was using it as a test project, although I was planning on continue the exercise through to IL generation, so I was hoping to start with a simple language. 

Looking at the "Google-like full text search" article, they use an identifier terminal for text.  That sort-of worked for me.  I'm not sure if it's a complete solution, as I was stopped by another problem (separate thread).

The problem with moving onto a different project is that the other two things I'd hoped to use Irony for were parsers for Markdown (http://en.wikipedia.org/wiki/Markdown) and Velocity (http://en.wikipedia.org/wiki/Apache_Velocity), both of which have the same trait.  If you are unfamiliar with them, consider how you would write a parser for XML or HTML.


Coordinator
Dec 29, 2008 at 10:18 PM
About "Google-like search" grammar - I think this is different from your case. In search grammar, every word in search phrase is treated as a separate token that distinguished as a separate argument in final expression. What you need is at certain position to join everything into a single literal string. This "certain position" is identified by parser, why creating literal string is supposed to be done by Scanner, which is normally done before parser. So as a minimum you'll need a custom FreeFormStringTerminal class that would flexibly produce token or return null depending on feedback information from parser state. In general, you can use CompilerContext to set some flag in parser that would be used by this custom terminal as a signal to go and produce token. CompilerContext.Values dictionary is a containter for storing such custom flags. 
About Markdown and Velocity. Velocity - cannot say much, I think basically implementing language parsing is possible with Irony. More interesting aspect is to support multi-language grammar combinations (HTML + JavaScript, HTML + Velocity), when 2 or more grammars are involved in parsing a single input stream. I was thinking about adding this support in Irony, although have little idea for now how to implement it the best way. Definitely on my list, quite common case. 
Markdown (wiki language) - also very interesting case, but definitely requires some extra work - custom terminal(s) and some extra support in scanner. In fact, i think most of the work would end up in custom terminals. 
To sum it up, all these languages are not typical programming languages for which Irony was architected. But it would be interesting to see how far can we go by tweaking Irony to support these kind of things. This is a good test for tool's flexibility and power.