Reusing a parser in a multithreaded process

Mar 10, 2010 at 6:02 PM

Hello!  I'm curious about three things: 1) Can a parser be reused to parse multiple scripts? 2) Is the parser thread safe? 3) If not, is the Grammar/LanguageData thread safe?  Can I create 10 Parser instances off the same LanguageData and concurrently parse 10 scripts without issue?

I've glanced over the source and I think these are the answers:

1) You can reuse a Parser (Irony.Parsing.Parser) to parse multiple scripts.

2) The parser is NOT thread safe.

3) ? - Not sure without further analysis.

I'm developing a server that will allow users to submit templates.  Say there are 100 users logged in and 25 of them post a new template at roughly the same time.  I will need to parse those concurrently (or at least, parse as many as I have threads available in the thread pool).  Do I need to instantiate a new Grammar, LanguageData, and Parser for each of these operations?  How expensive is it to instantiate these?  Can I share the same LanguageData amongst all Parser instances?

 

Thanks!

Coordinator
Mar 10, 2010 at 7:39 PM
Edited Mar 10, 2010 at 9:23 PM

Multi-threading story is the following: LanguageData is thread-safe and can be shared between threads, so you can create it once and place it into server-wide cache. Creating LanguageData is a large effort - it involves constructing parsing automaton. Every thread should create a Parser object using shared LanguageData and do the job. The Parser creation is a light job, not a big deal, it is just an object with a few fields. 

Mar 10, 2010 at 7:56 PM

Awesome!  That's exactly what I needed to know.  Thanks!

May 24, 2012 at 2:54 AM
Edited May 24, 2012 at 2:56 AM

Is a Grammer thread-safe? I'm using Irony to enable inline simple C#-like text functions in view-engine sort of scheme for a website. I started out with a global instance of the Parser but quickly realized that was not thread-safe. Then I made just my FunctionGrammer global. That seems to work except that I've seen some weird errors like:

Exception: System.IndexOutOfRangeException
Message: Index was outside the bounds of the array.
Source: mscorlib
   at System.Array.Clear(Array array, Int32 index, Int32 length)
   at System.Collections.Generic.List`1.Clear()
   at Irony.Parsing.Construction.GrammarDataBuilder.CreateProductions() in C:\code\Irony\Parsing\Data\Construction\GrammarDataBuilder.cs:line 163
   at Irony.Parsing.Construction.GrammarDataBuilder.Build() in C:\code\Irony\Parsing\Data\Construction\GrammarDataBuilder.cs:line 40
   at Irony.Parsing.Construction.LanguageDataBuilder.Build() in C:\code\Irony\Parsing\Data\Construction\LanguageDataBuilder.cs:line 37
   at Irony.Parsing.LanguageData.ConstructAll() in C:\code\Irony\Parsing\Data\LanguageData.cs:line 38
   at Irony.Parsing.LanguageData..ctor(Grammar grammar) in C:\code\Irony\Parsing\Data\LanguageData.cs:line 34
   at Irony.Parsing.Parser..ctor(Grammar grammar) in C:\code\Irony\Parsing\Parser\Parser.cs:line 28

I should note that I'm using the last available .NET 3.5 version of Irony... Also, to create a LanguageData object, is it this simple?

        /// <summary>
        /// Call this and store it globally
        /// </summary>
        public static LanguageData LanguageData()
        {
            return new LanguageData(new FunctionGrammer());
        }
Coordinator
May 24, 2012 at 3:05 AM

LanguageData should be static object, created once and shared. ParsingContext is the one that should be created per-thread and per parsing job.

Parser is also thread-safe (should be), but it is not expensive to construct, so you can create it on each thread as well.

What do you mean "you realized Parser is not thread-safe"? it should be, as long as you create parsing context per thread.

May 24, 2012 at 2:35 PM

Thanks for the quick response!

Regarding the Parser being thread-safe.. I meant that MY usage of Parser (as a static object) was not thread-safe, not necessarily the Parser object itself. This was entirely due to my lack of understanding of the Irony object model. Speaking of which...

Can you point me to any samples/snippets of how to properly set up a Parser (static LanguageData and ParsingContext I presume) that will be used in a multi-threaded environment?

Currently, I have my Grammer as a static object and create a new Parser for each text function i need to parse in a given thread (you can view the source here). So a thread might get a chunk of text like this: 


<h2>
 @if([Start Date Override Text]=="none",@formatdate([Start Date],"dddd MMMM d"),[Start Date Override Text])
</h2>
<div>
  <p>@if([More Info URL]=="none"," ","<a href='[More Info URL]'>MORE INFO</a>")</p>
</div>


I first use Regex to grab the funtions to parse, in this case:

  • @if([Start Date Override Text]=="none",@formatdate([Start Date],"dddd MMMM d"),[Start Date Override Text])
  • @if([More Info URL]=="none"," ","<a href='[More Info URL]'>MORE INFO</a>")

These are then parsed with a new instance of the Parser per function. I then store the ParseTrees for each function (to avoid the cost of re-parsing the same function) in memory (static Dictionary). 

I'm sure there is plenty I could refactor in order to gain efficiency and stability, I'm open to any suggestions. 

Thanks!

Coordinator
May 30, 2012 at 5:27 PM

there are no specific samples to look at. Just a guideline - setup LanguageData once and make it static singleton, then reuse it with parser/parsing contexts created for specific tasks. To see some basic code how to do it, inspect the code in the main form in Grammar Explorer