Reclaiming Commas from MakeListRule, MakePlusRule, and MakeStarRule

May 11, 2015 at 9:08 PM
Although I see Commas in the list of Tokens parsed, I do not see Commas in the ParseTree output.

Is it because I am using the three Make List Rule methods (MakeListRule(), MakePlusRule(), and MakeStarRule()) to create rules to parse list that the Commas are and not presented in the ParseTree output?

Is there an straight-forward way to add the Commas into the ParseTree output while still using the Make List Rule methods?
Coordinator
May 12, 2015 at 4:42 PM
It's on purpose, list members (nodes) is really what we need, and commas/separators are just syntactic noise. You can just iterate the tree after the parsing and inject the commas into nodes' child lists
May 12, 2015 at 9:25 PM
Edited May 12, 2015 at 9:27 PM
Thanks for the response.

I believe I understand what to do and probably need to do something similar with comments: Where each comment’s order in the parsed token stream is preserved by assigning all tokens a sequence number. The end goal is to achieve finer granularity as to the placement of comments before or after specific tokens. This is because I am developing an Intel PL/M to C-Language Translator. Many of the PL/M commas will become semicolons in the C –language translation. The PL/M code I need to translate has many end of line comments (/ ... /) after each declared Literal or Variable element.

Example:

Intel PL/M 386 Code
proc1:  Procedure (varA, varB, varC) BYTE;
   Declare
      varA BYTE,              /* varA is BYTE */
      varB INTEGER,        /* varB is INTEGER */
      varC UserType,       /* varC is user defined type */ 
      varLocal STRING;   /* varLocal is local variable */

      return varLocal;
end proc1;
K&R Style C Function
BYTE proc1 (varA, varB, varC)
varA BYTE;              /* varA is BYTE */
varB INTEGER;        /* varB is INTEGER */
varC UserType;       /* varC is user defined type */ 
{
     varLocal BYTE;   /* varLocal is local variable */

     return varLocal;
}
Coordinator
May 13, 2015 at 5:39 PM
Edited May 13, 2015 at 5:41 PM
Well, may I suggest a slightly different approach?
As far as I can guess, you are parsing source into a parse tree, and then try to manipulate parse tree 'in-place', and finally you generate the output based on the parse tree. So you try to generate output in different language directly from the CST (concrete syntax tree) of the input language.
I suggest to use AST (abstract syntax tree), just like all compilers/interpreters do. Create a number of AST node classes with the purpose of representing an abstract node like PROC, and virtual method for writing output in target C language. You'll provide an override of this WriteOutput method in each node type. So you take CST generated by parser, and then build AST.
Using AST allows you to structure your code better, I think, and again, this is a standard, experience-proven way to go. In this case, things like commas are syntactic noise which will disappear in AST anyway, and it does not matter that they are removed even earlier, by parser building CST (parse tree). The semicolons 'instead of' commas are generated directly when you write the output.
As for comments; the problem for you is that in Irony the parser attaches the comments to the first non-comment node AFTER the comment. I'm actually not sure what will happen with comments inside lists, but I think they'll survive, but attached to the node after. You'll need to create an intelligent process that finds these end-of-line comments and reattaches them to the node BEFORE. I think you can do this at parse tree level.
Roman