In case you're interested you can download the Excel formula grammar here:
http://www.lyquidity.com/downloads/ExcelGrammar.zip It's NOT throughly tested but it should parse the various elements of an Excel formula and fail in the same way Excel will.
Irony is a great addition to the set of .NET options available to programmers so I hope you will take these comments as someone who is trying to support the code. I found it challenging at times because it's a long time since I've tried to work with a tool
like this. As someone who is intimate with the code you have a detailed mental image of the working of Irony but to newbies like me the lack of comments around the code makes it harder to penetrate. When there are comments they are useful but are not usually
presented using the XML document notation so they cannot appear in intellisense nor are they picked up by, say, ndoc.
Putting my product managers hat on, I wonder who you see as your target audience. If my objective was to have a performant parser, I'd probably take the time to hand craft the parser. Or create the parser in C or C++ because if I can get managed code to parse
10,000 lines per second I'll be able to do better in C or C++. So maybe its someone who wants to create an arbitrary grammar in the .NET enviroment. However, it's quite hard to represent some grammars using Irony (or at least ones Irony is not explicitly built
to target) because of the way it works. I'm not a compiler expert but it seems to me, as a software designer of more years than I care to remember, that the scanner is trying to do more than it should making it necessary to do more work in terminals that is
ideal. The Irony scanner is responsible for more than tokenizing the the source because it is also responsible for determining the
correct terminal for each token. This is a lot of burden on the scanner since it doesn't have access to the grammar state. It seems to me that the job could be split in two: a tokenizer responsible for turning each token into
all the possible tokens which are then passed to the parser so it can decide which is the correct terminal to use. The scanner evaluates all the terminal types valid at a point anyway so feeding the parser all the terminal isn't going to affect performance.
I appreciate that feeding multiple terminals to the parser could require the parser to maintain multiple alternative the states until all but the correct one fail the grammar rules and that this could, potentially, affect performance. But it may have the benefit
of making it possible to do much more declaritively in the grammar rules. The terminals I've created require more logic than I'd like. You may well say re-iterate the point that Excel formulas are not exactly normal and I'd agree and that's one reason for
using them. Re-writing Ruby or Python would be of little educational value. My guess is that it would equally challenging writing a grammar for VB. So I come back to the question about audience. Being a me-too parser is nice because its in C# but it will always
be second best to the established tools for any serious parsing requirement. A value-add could be to make Irony great for creating parses of all sorts more easily, perhaps where factors other than performance are the priority. Microsoft it trying to make C#
more declaritive (well, less intentional, anyway) and that could be a theme for Irony too.
Anyway, my $0.02. I'm not really say anything new but since I've now created the grammar I'll be signing off, for a while anyway. So thanks for your help over the past few days.
Feb 7, 2008 at 7:06 AM
will try to do my best to answer. There are actually several problems/questinos raised in your post.
You found using Irony to be difficult at times - no XML comments, not enough examples and explanations; difficult for newbie. Agreed. Again, it's not that I planned it this way, it is just not there yet. The problem is
immaturity of Irony as a project, but not any intrinsic defect that is the root of the problems now.
You admit you're not expert in parsers and LALR or any grammar building (so am I). Unfortunately, there's no way to use a tool like this without some level of knowledge of parsing theory. Any tool you use would require an educational effort in this area. Irony
does not bring any theoretical innovation in parsing, so you don't need to know more than usual. Irony is functionally equivalent to Yacc/Lex, and recognizes the same universe of languages, and therefore requires more-less the same level of theoretical knowledge.
What I hope is different is that Irony's implementation is much more straightforward and modernized, more in line with OOP, .NET and c#. My purpose was to make the process easier, and I hope I WILL succeed - again, not yet, but working on this.
About possible choice for performant parser. I will use again this link:
Notice, these folks with 20 years of experience in parsing, releasing a custom-made parser (for just 2 languages - c# and VB) with "blazing" performance - 10k lines per second. They do use unmanaged c++ code (5k lines), I guess for speeding up critical pieces.
Irony's speed estimates beat this numbers! and Irony is for many languages, not just two! So it is not obvious that Irony is not a choice for a professional-grade parser. You somehow assume that Irony is a me-too parser and a second best - this is not the
case. Again, it is LALR parser functionally equivalent to YACC/LEX pair.
As for the target audience - I don't have anything specific in mind. Irony is not my final goal by itself that I try to sell to some audience; Irony is a by-product of my quest into problems of big business-oriented systems (ERP). I do ERP programming for living,
and ERP is riddled with problems. When I realized that some of these problems trace to inadequate programming language, and maybe we need a new one, I decided to look around and see what tools are there for building a compiler for this new language. I didn't
like what I found so I decided to build something myself. What came out is something I thought would be useful for other people, so I published it after spending some time on preparing it for public view. My goal is ERP, not compiler tools; as for progress
of Irony - I will move it on, but I hoped other people will join me if they find the concept useful and worth their effort.
Problems with scanner. Again, it seems to me - too many terminals! The problem is that you try to distinguish tokens with identical content as different terminals in different situations; to do this, you need "context" which is available only to parser - this
is your problem! Distinguish only tokens that are really different - leave minimum number of terminals - and let parser decide the rest! The rest means the role of each token - is it a number or index or relative row/column reference. Scanner should NOT make
these distinction at all - it should produce single token NUMBER in all cases. I cannot be 100% sure, but it seems to me this is the root of the problem. Again, you need to have some understanding of theoretical foundations of this stuff, so you can clearly
understand the LALR process and each module's responsibility. No offense.
that's the way I see it.
Feb 7, 2008 at 8:11 AM
Edited Feb 7, 2008 at 8:13 AM
As ever, thanks for your comments. I think you are being too defensive, I'm really delighted that you are trying to deliver Irony I'm just suggesting that you may find it being used by people other than those you might associate yourself with because it
seems to me that Irony falls between two stools. Someone who knows this stuff well is likely to know "standard" tools, know how to use them and choose to use them (I read the comments on CodeProject). My contention is that your audience are others, like me,
and we will benefit greatly from more hand-holding. I've been programming for 25 years and have a MSc in CS (Lex/Yacc is that old!). I did the theory at Uni but its a long time ago and I hoped Irony would help me, someone who was aware of this stuff but a
long time ago, get a parser built quickly.
Don't get me wrong, it helped enormously and if I were doing it again I'd be quicker because I now know more about what the code is doing. It would just be nice if Irony is more helpful/intelligent for the newbie. Yes, this may inject performance issues but
that's a compromise we often make because we can't be expert in everything. I love WPF because I can get so much graphical stuff done and because it's declaritive but it is ssss-llll-oooo-wwww. However if I want I can use GDI+ (or DirectDraw or one of the
other APIs) so get the job done more efficiently. Likewise, Irony could use as a "lower" level to remove the need for hand-holding which operates faster.
I take your point about re-defining NTs but it was an attempt to address the conflict. It seemed to work to an extent. Some of the re-definitions were also an attempt to make the grammar readable and so to make it maintainable.
By the way, on the comments, I've kicked code for a long time and I've learned to love comments. They seem redundant at the time because you know the code but pay dividends in the future (when you can no longer quite remember the purpose of every little quirk)
or to third parties who never knew them in the first place.