Some enhancements to Irony

Mar 28, 2013 at 12:58 AM
I have a few small changes to Irony that I have found useful. You may want to pick them up.

They are available at https://hg.codeplex.com/forks/jaybazuzi/irony.
In GrammarExplorer, when looking at a parse tree, if you expand a node with only one child, expand the child, too. This makes drilling in to parse trees much faster.
Irony generates "Unnamed" NonTerminals in these cases:
Foo.Rule = A + (B | C);   // `B|C` becomes `UnnamedX`.
When you see them in Grammar Explorer, it's hard to know what you're looking at. So I put the rule in to the Name field, which makes it visible in Grammar Explorer. (I think it's fair to assume that no code is depending on the value of Name for Unnamed NonTerminals, because the name is not stable.)
This makes it easier to understand the LogMessage in the debugger.
I wanted this for PowerShell's syntax, which recognizes backtick+CR+LF as a
single whitespace.
remove client from target framework, to ease building with MonoDevelop on Linux
Coordinator
Mar 28, 2013 at 3:45 AM
thank you! I'll have a look
Roman
Apr 3, 2013 at 4:17 AM
Here's another, in Irony/Parsing/Grammar/Grammar.cs:
@@ -89,10 +89,10 @@
     #region Reserved words handling
     //Reserved words handling
     public void MarkReservedWords(params string[] reservedWords) {
-      foreach (var word in reservedWords) {
-        var wdTerm = ToTerminal(word);
-        wdTerm.SetFlag(TermFlags.IsReservedWord);
-      }
+      MarkReservedWords(reservedWords.Select(ToTerm).ToArray());
+    }
+    public void MarkReservedWords(params KeyTerm[] reservedWords) {
+      Array.ForEach(reservedWords, wdTerm => wdTerm.SetFlag(TermFlags.IsReservedWord));
     }
     #endregion
This allows you to use MarkReservedWords in a typesafe manner.

I know that Pull Requests are probably a better way to send these, but (for better or worse) I have learned GIT and am clumsy in Mercurial.
Apr 6, 2013 at 12:19 AM
I see you are using events in a few places, like this:
    public event EventHandler<ParsingEventArgs> TokenCreated;

    internal void OnTokenCreated() {
      if (TokenCreated != null)
        TokenCreated(this, SharedParsingEventArgs);
    }
It's possible to eliminate the null check like this:
    public event EventHandler<ParsingEventArgs> TokenCreated = delegate {};

    internal void OnTokenCreated() {
      TokenCreated(this, SharedParsingEventArgs);
    }
That even makes it feasible to remove the On...() method, if you like.
Coordinator
Apr 6, 2013 at 12:52 AM
About the last suggestion - the point of the check is to not fire event if nobody listens. In your version event is invoked, and inside multicast it discovers that there are no handlers, so it returns. My version is a bit more efficient I guess. Using "On..." methods is a kinda recommended, pretty standard practice, although I violate all standard practices everywhere, I admit.
Apr 6, 2013 at 1:00 AM
It's largely a matter of taste - take it or leave it.

The standard practice is actually even more complex:
protected virtual void OnSomethingHappened(EventArgs e) 
{
    EventHandler handler = SomethingHappened;
    # Possible race here
    if (handler != null) 
    {
        handler(this, e);
    }
}
That's because there's a race condition if the only handler of the event unsubscribes on another thread at the wrong time - the event will go to null and then throw when you try to invoke it. That's why I like the = delegate {} approach - all this mess just goes away.

As for performance, this is the kind of micro-optimization that rarely produces any benefit for the user.

So, pick the flavor you like, and go with it.
Apr 6, 2013 at 1:02 AM
Here's another code offering, in ValidateTokenEventArgs:
    /// <summary>
    /// Replace the current token with a token based on <paramref name="newTerminal"/>, while keeping everything else 
    /// the same as the current token.
    /// </summary>
    /// <param name="newTerminal"></param>
    public void ReplaceToken(Terminal newTerminal) {
      ReplaceToken(new Token(newTerminal, Context.CurrentToken.Location, Context.CurrentToken.Text, Context.CurrentToken.Value));
    }
It lets the client code be slightly cleaner.
Coordinator
Apr 6, 2013 at 7:23 PM
about the 'delegate()' thing.. one thing to keep in mind when writing a fwk like Irony - need to reduce extra cognitive overload. There's a lot of things in Irony already to wrap your mind around, coming purely from algorithmic/LALR parsing roots, so adding these little 'tricks', as nice as they are in a proprietary, private code, might be sort of harmful in public code bases, especially if they have little to do with the main theme. So I would be very careful in adding any kind of niceties that can cause extra stop and question 'what the heck is that?', even if momentarily
Apr 6, 2013 at 7:40 PM
That empty delegate pattern is quite widespread, it shouldn't cause WTF question even for 6 months junior developer.
Coordinator
Apr 6, 2013 at 7:45 PM
widespread is not the same as 'everybody knows'. It was new for me - and I'm not newbie in c#, although not an active explorer of new tricks
Apr 6, 2013 at 8:51 PM
I want to remind everyone that Roman doesn't have to convince us -- we have to convince him.

The trick is old now - I have been using it since the beta of C# 2.0 / VS2005. But I love learning new tricks, and Roman isn't me.

I don't like the suggestion that Roman doesn't mean your expectations for a "6 months junior developer". Every developer learns their craft in different ways. His coding style is quite different than mine, but that doesn't mean he's doing it wrong. Roman has demonstrated his ability merely by creating this wonderful parsing engine that is so useful for us.
Apr 6, 2013 at 10:09 PM
The pattern of adding a NOP to avoid the check.

The delegate sample.

I have seen multiple forms of it.

"where 1=1 " & vbStringOfCondition
or
var s=""; if (string.IsNullOrEmpty(s))

This is an anti-pattern because mixes the concept of lack of assignment with emptiness.

I know that is easy, but is not simpler because it produces an entanglement of concepts (easy=near, simple=untangled).

And yes, a 6 month programer will easily understand and produce this anti-pattern, a 20 years developer knows better.

and btw, there is a huge difference between 10 years of experience and 1 year of experience 10 times.

I side with roman on this. The day that a 6 month programmer can produce a lexer and a parser of the quality of Irony maybe this kind of arrogant suggestions have a place
Coordinator
Apr 7, 2013 at 12:55 AM
Edited Apr 7, 2013 at 2:46 AM
guys, take it easy, I don't think Eugene's post was questioning mine or anybody's expertise, so we're cool
here's my 5 cents. First, I was always a bit puzzled by this obvious inconsistency of event fields. When you sign-up (add handler) you don't need to check anything, compiler or system creates multicast delegate instance if it does not exist. But on invoke, you have to check explicitly. Kinda strange.
Secondly, here's the suggested workaround again:
public event EventHandler<ParsingEventArgs> TokenCreated = delegate {};
Do you understand what exactly happens here? I don't. First of all, syntax - quite a puzzle. What is this, call to constructor? There's no 'new' keyword. 'delegate' is a keyword, mentioned in c# manual in quite different context. Quite strange construct for regular c# guy, and this is the 'WTF' situation I'm talking about.
Secondly, delegate instances (multicast delegates) are supposed to be strongly typed objects. When adding handlers or invoking it, you must strictly comply with delegate's declaration pattern. If you don't, compiler rejects it. Then what kind of instance is is created here and put into the field?!
As far as I can see, the delegate instance in fact is not generic, one 'untyped' class is used for everything, and compiler type checking is pure compiler-added trick.
But this magic-ry, relying on c# underlying secrets - and strange syntactic constructs - kinda goes against my strongly typed mindset. It maybe exposes 'how things really work underneath', but obviously breaks the illusion of strongly-typed events. So I prefer to live under illusion. And not disillusion others if I can :)))
have a nice weekend guys!!!
Stop staring at the computer, it's Saturday, GO OUTSIDE, there's nature and all kind of fun there!
Roman

(Edited)
Apr 8, 2013 at 8:25 AM
Edited Apr 8, 2013 at 8:25 AM
First of all, I'm actually didn't questioning anybody's expertise. My only concern is code readability, that is very important for open source projects. I know that Roman is trying to get maximum performance from Irony, but I'm not sure that fractions of percent are worth clogging code with null-checks. IIRC, calling empty delegate is only 2 times slower than null-check.

Empty delegate pattern is not anti-pattern at all. It's implementation of Null Object pattern, that developer with 20 years experience should know.
Do you understand what exactly happens here? I don't. First of all, syntax - quite a puzzle.
Maybe following syntax is more clear?
public event EventHandler<ParsingEventArgs> TokenCreated = new EventHandler<ParsingEventArgs>(delegate {});
Another option is to use helper method "Raise" that will encapsulate null-checks. This approach has another plus that we can easily update event-raising code to avoid race-conditions in multi-threaded environment. Not sure if multi-threading parsing is on Irony's roadmap though.
Coordinator
Apr 8, 2013 at 7:51 PM
nope, that does not make it more clear. TokenCreated field is a MultiCastDelegate - a container for a list of listeners, handlers. What you assign is an empty handler. Makes much sense? Nope. What you're doing is singing up an empty delegate to non-initialized 'container'. Makes sense only if you know that underneath the compiler injects the code that checks the field for NULL and automatically initializes it. For me, it is quite 'wtf' situation. Just for avoiding simple 'if(TokenCreated!=nul)..' ?!!!
My point is that 'if(TokenCreated!=null)' is really a common, basic, everywhere-in-MSDN thing (hate the word pattern), so using it has no chance of 'whats that?' mental stop. Anything else - not worth any potential gains, and nano-second perf differences are irrelevant - agree on that. That's for readability.
'Raise' method - recommended is OnXXX protected method, like OnTokenCreated. I sometimes skip it if event is raised from one place only. Multi-threading - any race is possible only at parser construction time, when handlers signup from the grammar - this is single-threaded process by definition. For runtime, parsing time, your are free to use parallel parsers using shared ParserDAta instance (not contexts); the handlers are only invoked, and it is safe to do from multiple threads.
Apr 8, 2013 at 8:07 PM
Multi-threaded parsing is irrelevant: if the parser is running on one thread, and I remove my last listener on another thread, at just the right time, it will crash. Irony is still single-threaded.

The multi-threading concern is very small, because the chances of someone removing the last listener on one thread, while it's firing on another thread, are almost 0. People usually don't remove listeners at all, so they never go to null throughout the lifetime of the program.

However, race conditions, when they do arise, can be extremely difficult to diagnose - a crash report from a user that only appears 1 time in a 1000, so no one gets you a callstack. Instead of asking myself "for this event handler, is it important to make a local copy before I fire it?", and just write = delegate {} and I'm done.

It has been a long time since I analyzed why this trick works, but I think it's not because of a compiler-generated null-check, but because all delegates are multicast. Consider:
            Action x = delegate { Console.WriteLine("hi"); };
            x += delegate { Console.WriteLine("bye"); };
            x();

            x = delegate { Console.WriteLine("no!"); };
            x();
produces
hi
bye
no!
The fact that this is "WTF" for Roman is a good reason for him not to use it. Every developer has their own taste for learning new language tricks.

I still think there's value in this pattern:
  • No need to write null check
  • No worries about multi-threading
  • The = delegagate { }; is much less code
  • No need to write a special Fire... method, just skip it
  • It just works
  • It's just an implementation detail - whatever approach you choose doesn't affect consumers of your code.
So I hope that everyone else reading this thread will think "oh, I'm going to use that from now on!", even if Roman doesn't.
Apr 8, 2013 at 10:04 PM
Edited Apr 8, 2013 at 10:10 PM
I love this debate, so I thought I would chime in.

Here is a great Stack Overflow response to a similar topic: http://stackoverflow.com/a/292840

I wanted to offer up a solution that addresses the redundant null checks, yet keeps them intact.

What if an extension method is added to do the null check?
// non-generic
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Raise(this EventHandler handler, object sender, EventArgs e)
{
    if(handler != null)
        handler(sender, e);
}

// generic
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Raise<T>(this EventHandler<T> handler, object sender, T e)
where T : EventArgs
{
    if(handler != null)
        handler(sender, e);
}
sorry, I haven't tested these, but you should get the idea.


This will remove the redundant logic, which I believe was the point of the original concern, and should be simple to implement.

-Kevin
Coordinator
Apr 9, 2013 at 12:07 AM
extension method wouldn't work. Simply because of 'event' keyword - it prohibits raising event from outside the class instance - read about 'event' in MSDN, this is in fact my favorite question on job interviews - what's the difference with and without 'event' keyword in delegate field declaration
Coordinator
Apr 9, 2013 at 12:14 AM
Jay,
Irony is multi-threaded - provide a single restriction is met - ParserData is readonly. Adding/removing event handlers to terminals or grammar or whatever - it is modifying grammar data. And it seems to me quite unreasonable thing to do, during parsing. You should setup everything, including event handlers, at parser construction time - when you build ParserData. Once ParserData is built, you can share it between a hundred parsers running in parallel.
Another thing, about the trick with assigning to local variable before invoking the event. I don't think it solves anything. It might matter in case of 0 or 1 handlers signed up, but it will definitely crash if you have 4 handlers signed up, one thread is adding hander #5, and another thread is invoking event. The whole thing will crash, as expected - as any normal list/collection crashes if you read it while another thread modifies it. Assigning to local variable does not change anything, delegates are not value types, it's the same object with list of handlers inside.
Coordinator
Apr 9, 2013 at 12:20 AM
Jay, and in your last post statement 'No worries about multi-threading' when using this delegate thing - no, multi-threading is still an issue in this case, as I explained in previous post, same stuff applies.
Apr 9, 2013 at 12:36 AM
rivantsov wrote:
Another thing, about the trick with assigning to local variable before invoking the event. I don't think it solves anything. It might matter in case of 0 or 1 handlers signed up, but it will definitely crash if you have 4 handlers signed up, one thread is adding hander #5, and another thread is invoking event. The whole thing will crash, as expected - as any normal list/collection crashes if you read it while another thread modifies it. Assigning to local variable does not change anything, delegates are not value types, it's the same object with list of handlers inside.
If I'm reading this correctly, you seem to be saying that multicast delegates are inherently unsafe across threads - that they can't tolerate mutation while being fired.

I was surprised, so went Googling, and found this: http://blogs.msdn.com/b/ericlippert/archive/2009/04/29/events-and-races.aspx
Remember, multi-cast delegates are immutable; when you add or remove a handler, you replace the existing multi-cast delegate object with a different delegate object that has different behaviour.
So, if someone (un)subscribes to your event while it is firing, it will not crash the event.

That article covers a separate race condition, that if you unsubscribe from an event, your handler might still get fired for a short time afterwards, which is a problem if you destroy state you need to correctly handle the event. But that's an entirely separate issue, and not the responsibility of the code firing the event.
Apr 9, 2013 at 1:44 AM
Roman,

I have read the msdn event keyword article again, and I see what you are referring to.

"Events are a special kind of multicast delegate that can only be invoked from within the class or struct where they are declared (the publisher class)."

However, if you take a look at the first paragraph of the extension method article you will see that it is a special static method that executes the call as if was an instance of the class. It's special this syntax and the fact that it's static allow it to overcome the hurdle you are referring.

Give it a try, I have found several threads that illustrate this approach and I have even used this approach. Here are a few links. In the comments of the first, you will see the explanation of the attributes, and the third shows some illustrative performance differences between all 3 methods. However, performance in a real world app should be irrelevant because it is so small.

http://stackoverflow.com/questions/231525/raising-c-sharp-events-with-an-extension-method-is-it-bad

http://www.dailycoding.com/Posts/top_5_small_but_must_have_extension_methods.aspx

http://blog.drorhelper.com/2009/02/three-different-ways-to-throw-events-in.html

Kevin
Coordinator
Apr 9, 2013 at 5:34 AM
wow... you guys keep surprising me - and I thought I knew c#. I need to try it myself,
thanks!
Roman