Irony to translate from language A to B.

Aug 28, 2011 at 3:49 AM

I am looking into a project that would involve translating from one language to another. I am looking into Irony as I find the overall approach of this parser more elegant compared to the lex/yacc approach I have taken in the past (long time ago).

Now, I am a little unsure of what approach to take, should I just create a grammar and generate the translation by walking through the parse tree? Or would it make more sense to treat the problem more as an interpreter issue and create specific AST nodes and just have the resulting tree evaluate itself into destination code? It seems some of the constructs within the interpreter would not be well suited for the task but at the same time the approach to parsing through an AST and constructing the code in a hierarchical way seems to make sense to me.

Also, what is the best way to deal with "c style" preprocessor directives? Is it best to do a two pass approach or simply deal with preprocessor directives as if they were part of the grammar. In some way, it seems like the on-the-fly text manipulation would be problematic for the parser.

Thanks,

Sebastien

Coordinator
Aug 29, 2011 at 5:25 PM

Hi! 

About the approach to translation. I would suggest to go without AST, just Parse tree should be enough. 

As for c-style pre-processor - that's bad news, the support for this is not implemented yet. But I think it's doable, so it's a good time to start implementing it - why don't you try it? What preprocessor commands do you need? Some would be relatively easy, others may be challenging.

Roman

Aug 29, 2011 at 5:36 PM
Hi.

I thought AST would be the simplest. Just seemed like there were
constructs in the evaluator that didn't really make sense but I guess
nothing prevents from deriving also my nodes from my own interface
which provides a more appropriate "context" for translation.

For the pre-processor, just basic c style #define/#ifdef type
processing. I don't have a problem implementing this and sharing my
changes if need be. May require some guidance since I am new to the
Ivory codebase.

I did manage to get my grammar in-place (essentially parsing HLSL
shader language with intent on converting to a variant of GLSL). I
have run into a number of conflicts which I solved most of them, the
few remaining ones are a little more tricky without context at least.
I don't have access to the code, but in pseudo code, for example:

variable_decl : type ID ( assignement )? SEMICOLON
function_decl : (type | VOID) ID ( LPAREN argumentList RPAREN)?
type : builtInType | ID

The problem is that type must have an ID as an option since, for
example, a structure would be defined ahead of time and would be
referenced by its name. Is there a way to solve something like this
without any context? If not, is is possible to store some context as
the tree is built (i.e symbol table)?

Also, I noticed changes went in to allow for the semi-automatic
conflict resolution. How stable is the main code branch as this could
help resolve a few of my other remaining conflicts? Also, is there any
plans for a more per-rule based "ResolveInCode" to avoid a complex
function when having to resolve for multiple rules.

Thanks,
Sebastien

On Mon, Aug 29, 2011 at 10:25 AM, rivantsov <notifications@codeplex.com> wrote:
> From: rivantsov
>
> Hi!
>
> About the approach to translation. I would suggest to go without AST, just
> Parse tree should be enough.
>
> As for c-style pre-processor - that's bad news, the support for this is not
> implemented yet. But I think it's doable, so it's a good time to start
> implementing it - why don't you try it? What preprocessor commands do you
> need? Some would be relatively easy, others may be challenging.
>
> Roman
>
> Read the full discussion online.
>
> To add a post to this discussion, reply to this email
> ([email removed])
>
> To start a new discussion for this project, email
> [email removed]
>
> You are receiving this email because you subscribed to this discussion on
> CodePlex. You can unsubscribe on CodePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any posts
> to this discussion will also be available online at CodePlex.com



--
_________________________________________________
Sebastien St-Laurent (http://blogs.msdn.com/sebby1234)
Owner of Paradoxal Press (http://www.ParadoxalPress.com)
Author of "Shaders for Game Programmers and Artists"
Author of "The COMPLETE Effect and HLSL Guide"
_________________________________________________
Aug 30, 2011 at 2:10 PM
Edited Aug 30, 2011 at 2:24 PM

Funny coincidence, as I have also written for the last days a full Hlsl parser using Irony, with usage for both translating to glsl and syntax analyzing. More funny is that I saw also the book "The Complete Effect and HLSL Guide" while working on dissecting the whole Hlsl grammar... ;)

We were working previously on a legacy ANTLR grammar that was working fine (but grammar was incomplete) but I wanted to give a chance to Irony (almost following it from the beginning) and It did it!

So far, the grammar I'm working on it is complete and able to parse all DirectX SDK fx files as well as exotic fx files. Though It's for a private/commercial product, I can share a bit of my work:

  • Sebastien, I would recommend you to build your own Ast (not using necessarily Irony's infrastructure, as your Ast should be parser independent) and using the same Ast to perform clean/up transformation to the desired language (with the assumption that the destination language is very similar - syntax like c... etc.).
  • All the analysis work (type resolution, type inference) should be done on the Ast, as the analysis is fairly complex when you have to deal with type inference (type inference is necessary if you intend to generate Glsl and you need to add for example casting at proper place while Hlsl is performing lots of implicit casts)
  • Also for the type resolution, It is possible to avoid it at parsing time by performing some post-process on the Ast (It means that "type" should also be an identifier... but then you will have more conflicts to resolve).
  • For conflicts resolution, you should probably rely on CustomGrammarHint, as they provide a basic infrastructure to implement your own conflicts resolver (default from Irony are not enough in certain cases).
  • You will have to write some custom terminal matcher if you intend to parse all the corners of the Hlsl grammar (like asm {} inline blocks)
  • Integrating preprocessing grammar into the Hlsl is not a practical solution. You need to preprocess it, either by using an external preprocessor or by using some plumbing of Irony (TokenFilters, custom terminals...etc.).

Concerning C++ preprocessing, I used to manage it with DirectX preprocessor included in D3dcompier_xx.dll but I would like to investigate if Irony is capable of it.

There are some issues and I would like to know your advice :

  1. It seems that SourceLocation doesn't contain any filename reference and this is annoying: suppose I plug a TokenFilter that will explode Tokens based on an external file, It is important to track from which file a token is coming (and not only a line/column). Do you think adding filename to SourceLocation is fine?
  2. A TokenFilter could change the current location (only line/filename) of a SourceStream, but I'm not sure It is working well with later method called SourceStream.MoveLocationToPreviewPosition that is modifying the Location just after a token is produced (though this method just seems to add a delta to the SourceLocation.Line). Do you think there is an issue to modify the SourceLocation from a TokenFilter?

What do you think?

[Edit] Errata, for issue 2 with TokenFilter/preprocessing, It seems better to let the TokenFilter modify the token's SourceLocation without modifiying the SourceStream.Location [/Edit]

Aug 30, 2011 at 2:27 PM
Hi!

It's a small world after all. I just love how this book I wrote years
ago is still being used, even though it's DX9 specific. :)

What I am working on is also for commercial purposes, so I can't even
really say "why" I need to translate HLSL to GLSL due to NDA. But at
the moment we are bouncing two ideas. Either translate direct HLSL to
GLSL or translate HLSL Dissasembly into GLSL. One has more context
information but also require a more complex parser/AST and the other
one has a simpler grammar and parsing structure but lacks some context
information which I will have to get from the reflection interface in
D3DCompile. Oh and I should mention, in my case I am looking at having
a parser that supports both DX9 and DX10, so it is another case where
parsing the dissasembly may be easier.

I was planning on making my own AST nodes, likely deriving from an
ITranslate interface. That way I can take advantage of the AST
building capabilities of Irony and also expose the functionality that
is more relevant to translating rather than evaluating.

I started from some baseline grammar I found online but it seems like
it is full of errors so I will have to make a good number of changes.
But so far have been able to get all the conflicts resolved, thanks in
part to the new ResolveIf/ShiftIf constructs.

For the pre-processor, I am also leaning towards the D3DPreprocess
calls to start with. I am still interested in rolling out my own but
for the project the priority is more towards getting GLSL translation
working ASAP. But for dealing with things like #includes, I think to
make the stream reader have the ability to stack streams on top of
each other may be the key (I have not looked into exactly how Irony
does the parsing yet, so I am speculatively talking here). The
trickier part is the macro expansion since it almost needs to be done
before the actual token resolution is done, at least unless you simply
want to assume that macro substitution is only done on identifiers,
which may not be 100% correct.

On Tue, Aug 30, 2011 at 7:10 AM, alexandre_mutel
<notifications@codeplex.com> wrote:
> From: alexandre_mutel
>
> Funny coincidence, as I have also written for the last days a full Hlsl
> parser using Irony, with usage for both translating to glsl and syntax
> analyzing. More funny is that I saw also the book "The Complete Effect and
> HLSL Guide" while working on dissecting the whole Hlsl grammar... ;)
>
> We were working previously on a legacy ANTLR grammar that was working fine
> (but grammar was incomplete) but I wanted to give a chance to Irony (almost
> following it from the beginning) and It did it!
>
> So far, the grammar I'm working on it is complete and able to parse all
> DirectX SDK fx files as well as exotic fx files. Though It's for a
> private/commercial product, I can share a bit of my work:
>
> Sebastien, I would recommend you to build your own Ast (not using
> necessarily Irony's infrastructure, as your Ast should be parser
> independent) and using the same Ast to perform clean/up transformation to
> the desired language (with the assumption that the destination language is
> very similar - syntax like c... etc.).
> All the analysis work (type resolution, type inference) should be done on
> the Ast, as the analysis is fairly complex when you have to deal with type
> inference (type inference is necessary if you intend to generate Glsl and
> you need to add for example casting at proper place while Hlsl is performing
> lots of implicit casts)
> Also for the type resolution, It is possible to avoid it at parsing time by
> performing some post-process on the Ast (It means that "type" should also be
> an identifier... but then you will have more conflicts to resolve).
> For conflicts resolution, you should probably rely on CustomGrammarHint, has
> they provide a basic infrastructure to implement your own conflicts resolver
> (default from Irony are not enough in certain cases).
> You will have to write some custom terminal matcher if you intend to parse
> all the corners of the Hlsl grammar (like asm {} inline blocks)
> Integrating preprocessing grammar into the Hlsl is not a practical solution.
> You need to preprocess it, either by using an external preprocessor or by
> using some plumbing of Irony (TokenFilters, custom terminals...etc.).
>
> Concerning C++ preprocessing, I used to manage it with DirectX preprocessor
> included in D3dcompier_xx.dll but I would like to investigate if Irony is
> capable of it.
>
> There are some issues and I would like to know your advice :
>
> It seems that SourceLocation doesn't contain any filename reference and this
> is annoying: suppose I plug a TokenFilter that will explode Tokens based on
> an external file, It is important to track from which file a token is coming
> (and not only a line/column). Do you think adding filename to SourceLocation
> is fine?
> A TokenFilter could change the current location (only line/filename) of a
> SourceStream, but I'm not sure It is working well with later method called
> SourceStream.MoveLocationToPreviewPosition that is modifying the Location
> just after a token is produced (though this method just seems to add a delta
> to the SourceLocation.Line). Do you think there is an issue to modify the
> SourceLocation from a TokenFilter?
>
> What do you think?
>
> Read the full discussion online.
>
> To add a post to this discussion, reply to this email
> ([email removed])
>
> To start a new discussion for this project, email
> [email removed]
>
> You are receiving this email because you subscribed to this discussion on
> CodePlex. You can unsubscribe on CodePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any posts
> to this discussion will also be available online at CodePlex.com



--
_________________________________________________
Sebastien St-Laurent (http://blogs.msdn.com/sebby1234)
Owner of Paradoxal Press (http://www.ParadoxalPress.com)
Author of "Shaders for Game Programmers and Artists"
Author of "The COMPLETE Effect and HLSL Guide"
_________________________________________________