Newbie question about continuation line char

Feb 25, 2010 at 8:32 PM


first of all I want to say that this project is fantastic, exactly what i need.

I'm tring to write a grammar for our legacy language but I have some difficulties.
Maybe it's my little expirience on parser , i read the dragon book many years ago...and now
i'm reading again!!!

The first problem is how to handle continuation line character.
our legacy language have some similarities with visual basic. we can write code like this:

find table[xxx] _ '  comment 1
      key[yyy]  _ ' comment 2


this is one statement, split in three lines; we use, like vb, the underscore to join the lines. the statement
is close by the last new line.
we also admin comment at the end of the line, AFTER the undescore...

First of all, I wanted to try resolving a simpler problem, so I suppose to have only statements like this:

find table[xxx] _ 
      key[yyy] _ 

where I remove the comments.

I made a litte reserch on internet and i found that some parsers permit to express whitespace with
regular expression:

Whitespace     = {WS}+
               | '_' {WS}* {CR}? {LF}?

with this little "trick" they manage continuation line char easily.

How can rappresent this construct with irony?

Is there any plan to specify WhiteSpaceChar as regex?


Feb 25, 2010 at 11:22 PM
Edited Feb 25, 2010 at 11:24 PM

I think you can use CommentTerminal to express this, but you have to hack it. You need to make it "eat" the newLine symbol. Look inside the CommentTerminal.CompleteMatch method, around line 102. It "eats" the end symbol only if it is NOT newLine. Make an extra public bool property "ConsumeEndSymbol", default true, and use it in the "if" condition there. Add extra constructor parameter for this value.

Now you can use it: start symbol would be "_", and end symbol "\r\n", with ConsumeEndSymbol = true 

Note that you still need other CommentTerminal for comments starting with quotes - for these set ConsumeEndSymbol to false. 

I will do the changes it in the next code drop, for now just hack it yourself.


Mar 2, 2010 at 2:27 PM

Hi Roman,

thank you for the hint, it works very well: the continuation lines can be treated as a "comment block" that start with "_" and ends with \n.

I think that this kind of solution is so valid for continuation line char that it would be useful to implement a specific Terminal (in the standard).

I also understand the role of NonGrammarTerminals: you can manage paricular situation without complicate the grammar.

Again, thank you!

Mar 2, 2010 at 5:20 PM

I'm glad it works but there's one trouble - if you have identifiers that start with "_", like "_myField" then this continuation/comment literal will catch it and interpret as continuation symbol. There must be a little more analysis there, and so I agree this must be a specialized terminal.

Mar 2, 2010 at 5:25 PM

One more thing - it might seem like this trouble with identifiers starting with underscores can be avoided by assigning lower priority to continuation terminal. However, I'm now changing the way NonGrammarTerminals work (for different reasons, but important enough case), so NonGrammarTerminals will always be the first to try by scanner. So playing with priority wouldn't work at all in the future. I will put this new continuation terminal on my to-do list - which keeps growing and growing :( 

Feb 26, 2011 at 12:34 AM

Is the new continuation terminal available yet?

I have a similar issue, except that the grammar expresses the continuation as a single "&" at the start of the next line. 


if x = 0 then (
&  set y = 0)

Any ideas?

Mar 2, 2011 at 4:25 PM

sorry for long delay in reply. No, things are the same, did not get to it. Try hacking something yourself, let me know how it goes


Mar 2, 2011 at 6:19 PM
Edited Mar 2, 2011 at 6:22 PM

Thanks for the reply.


I have managed to create the following hack:


 var lineContinuation = new CommentTerminal("line_continuation", "\n&", " ");


It sort of relies on the assumption that the programmer having indented the continuation lines with spaces. Luckily that does appear to have been the case so far, but that isn't guaranteed. It is only a matter of time before I hit code that is in itself a hack and isn't properly indented.

I have gone from zero (yes, totally zero) to a fully working complex grammar in three days so I have to congratulate you on a fantastic piece of software.


I intend to have a look at hacking together a special terminal because I know I will need it in two further grammars and those wont be quite so well structured with indentation (or even a space for that matter)

So I think I will have a few questions shortly. I hope you don't mind.