Macro expansion

Nov 4, 2009 at 10:02 AM
Edited Nov 5, 2009 at 2:31 PM

Hi Roman,

I have just found Irony. Looks like a really promising project!

Following the previous discussion "Intext macro", but not strictly related to the same thing, I was looking to find how to implement a macro expansion like it's used in some x86 asm assemblers :

 

mymacro macro param1, param2 
&param1: mov eax, &param2
endm .code mymacro A , ebx inc [eax] ret

When macro expansion is done :

.code
A:
mov eax, ebx
inc [eax]
ret

So mymacro is part of a kind of "pre-procsesing stage", where all macros are expanded on a first pass. The macro can perform complex operations, like for/loop/if/else... etc, much more than what we use to find in C/C++ preprocessor. It's a kind of language inside a language.

Once the macro expansion is done, on the second pass, we are able to parse asm instructions and translate them to opcodes.

Is such a feature is available or might be possible (in the future?) with Irony?

I'm not talking about something that do the macro expansion for me. The problem is more if Irony is able to handle this with addittionnal code (without modifying the "lexer/parser" core code?).

Thanks!

Coordinator
Nov 5, 2009 at 2:32 AM

Hard to say, without seeing more of the macro language, but I'm afraid there's no support in Irony for this so far.

If it were for simple "injection with param-replacement" macro like in c, I would suggets the following path. First, you need to have a non-terminal for macro definition. When this NonTerminal is reduced (parse node created), a custom hooked code should grab the macro definition and store it in some convenient form, in some list - which should be available to macro expander - see below.

Then to expand the defined macros, I think the easiest solution would be special Token filter. It should intercept the token stream, and if an identifier token matches the name of the macro, then expander jumps into action. It should read macro parameters from input stream. After replicating tokens from macro definition body and replacing parameters the filter/expander passes the resulting expansion tokens down the token stream.

That would be my plan if I did this now. Hope to get to macros soon, they are on my to-do list

Roman

 

Nov 5, 2009 at 10:54 AM
Edited Nov 5, 2009 at 2:37 PM

Thanks for your prompt reply!

FilterToken is indeed nice but probably not enough powerful to handle such macro expansion.

I tried to play with Irony and did a simple macro expansion implementation like this :

- When a MacroDeclaration is found in the parser (just after a reduce), call a callback to compile it (this call back is language dependent). The Callback should have access to a partial tree (i did a test with the ParseStack).

- When a MacroCall reduce is found, call a callback that will expand from a previous compiled MacroDeclaration and modify position and text of Parser.Context.SourceStream. And notify the parser to not add the MacroCall TreeNode to the stack

I have slightly modified the Irony source (add public members for SourceStream and ParserStack on Context, add ReduceEvent on Parser, call ReduceEvent from CoreParser in ExecuteReduce).

The following code is performing a double macro expansion. First macro called is my_macro, that contains another macro definition that is in turn called by my_submacro.

 

        static void Main()
{
AsmGrammar g = new AsmGrammar();
Parser parser = new Parser(g);
parser.ReduceEvent += parser_ReduceEvent;
var textToParse =
@"my_macro MACRO
my_submacro MACRO
test_inline
endm
endm
my_macro
my_submacro
end
"
;
ParseTree t = parser.Parse(textToParse);
Console.WriteLine(t.ToXml());
}

static Dictionary<string, ParseTreeNode> mapMacroNameToTreeNode = new Dictionary<string, ParseTreeNode>();


static void parser_ReduceEvent(object sender, ParserEventArgs e)
{
Parser parser = (Parser)sender;

// Handle Macro declaration while parsing
if (e.Node.Term.Name == "MacroDir")
{
bool isTopLevelMacro = true;
// First child of MacroDir is Macro name
for(int i = parser.Context.ParserStack.Count-1; i >= 0; i--) {
BnfTerm term = parser.Context.ParserStack[i].Term;
if (term != null && term.Name == "MacroStmtList")
{
isTopLevelMacro = false;
break;
}
}
if (isTopLevelMacro)
{
mapMacroNameToTreeNode.Add(e.Node.FirstChild.Token.ValueString, e.Node);
}
} else // Perform Macro expansion while parsing
if (e.Node.Term.Name == "MacroCall")
{
SourceStream sourceStream = parser.Context.SourceStream;
ParseTreeNode macroCallNode = e.Node;

string macroName = e.Node.FirstChild.Token.ValueString;

ParseTreeNode macroDeclaration;
if (mapMacroNameToTreeNode.TryGetValue(macroName, out macroDeclaration))
{
ParseTreeNode macroBodyNode = FindChildTreeNodeByTermName(macroDeclaration, "MacroBody");

string expandedText = sourceStream.Text.Substring(macroBodyNode.Span.Location.Position,
macroBodyNode.Span.Length);

// Get MacroCall Span
int cutFrom = macroCallNode.Span.Location.Position;
int cutLenght = macroCallNode.Span.Length;
int cutTo = macroCallNode.Span.EndPosition;

// Substitute MacroCall with a new MacroDir
// The exapnded text is here hardcoded for the sample, but it should be generated from mytoto previous MacroDir
StringBuilder newTextBuilder = new StringBuilder(sourceStream.Text.Substring(0, cutFrom));

// Add expanded text
newTextBuilder.Append(expandedText);

// Add Tail
newTextBuilder.Append(sourceStream.Text.Substring(cutFrom + cutLenght,
sourceStream.Text.Length - cutTo));

// Reset location to start of MacroCall
sourceStream.Location = new SourceLocation(cutFrom, macroCallNode.Span.Location.Line, macroCallNode.Span.Location.Column);
sourceStream.Text = newTextBuilder.ToString();

// Set NodeIgnore to true => macroCallNode will not be appended to the stack
e.NodeIgnore = true;
}
}
}

 

The only things i have modified in CoreParser.ExecuteReduce is handling the NodeIgnore in order to not push the TreeNode if the parser call back has done some expansion:

 

      // Shift to new state (LALR) or push new node into input stack(NLALR, NLALRT)
if (Data.ParseMethod == ParseMethod.Lalr) {

// Call Parser.OnReduceEvent
ParserEventArgs parserEvent = new ParserEventArgs(newNode);
Parser.OnReduceEvent(parserEvent);

// If Node should be ignored, than don't add it to the parser stack (macro expansion is probably in action)
if (!parserEvent.NodeIgnore)
{
//execute shift over non-terminal
var action = Context.CurrentParserState.Actions[reduceProduction.LValue];
Context.ParserStack.Push(newNode, action.NewState);
Context.CurrentParserState = action.NewState;
}
}

 

The resulting text parsed effectively by the parser and generating a tree is in the end :

 

var expandedText =
@"my_macro MACRO
my_submacro MACRO
test_inline
endm
endm
my_submacro MACRO
test_inline
endm
test_inline
end
"
;

 

From this simple example, it seems to be quite easy to add expansion macro, although it's not probably a robust solution. Anyway, the most important thing is to have a macro expansion that have access to the current TreeNode stack in order to perform precompilation while parsing.

Hope that this macro expansion could help someone.

Irony is impressive! I guess that the more we will be able to plug some "callback" in the scanner-parser chain, the more Irony will be flexible to implement complex things!

([Edit]Although, the "callback" feature doesn't need to be plugged in the Parser, but could be plugged in the grammar, with action attached (like ANTLR actions) to a NonTerminal element event (OnReduce...etc.).[/Edit])

 

Coordinator
Nov 6, 2009 at 11:50 PM

Kudos! you did it - congratulations!

A bit hacky, but as long as it works for you - great!

Reduce event on NonTerminals - good point, will see how to do this. There is already parser even ParserAction or something that you can use (looks like you didn't, you added Reduce event), but probably providing Reduce action at non-terminal level would be useful as well.

As for the details of your solution to macro challenge - I've closely looked at it trying to see if I can borrow something for general macro support. I see one big trouble - when expanding a macro, you modify the source text in the scanner - you prepend the expanded macro body to the remaining text. I think this might be a problem. The process might start generating a lot of new BIG strings, and thus garbage collector must collect much more frequently. Just as an example: let's say your program defines a simple macro symbol like "YES = 1" at the beginning of 20k source file, and then starts using the symbol heavily. On each occurrence of the symbol, macro expander would create a copy of substantial part of source string, prepend it with "1", and place it into the source. Quite a garbage. Better solution would be I think to prepend source with tokens, copy of body expansion tokens. Expander can push it to input buffer of parser. So I still think that TokenFilter is a right place for generic implementation of macro processor - in this case it can be optionally added when needed. Token filter has access to all information necessary - even ParserStack that you use in your code to detect top-level definition. I really hope macroscan be done generically without such extremes - it is hard to generalize this parser stack lookup. No criticism, just thinking...

Your expander does not do any parameter substitution - that should be part of any macro processor. I understand that's not that difficult to add to your code, but that would complicate things a bit more.

One unusual thing I discovered in your example is nested macro definitions. I never thought about this. It is then a bit puzzling for me - how this works at all. Normally in macro systems like c, the body of the macro is just a sequence of tokens which does not have any programmatic structure until it is actually expanded in particular place in source. It means parser reads macro body in definition as plane stream of tokens until some predefined macro end token. In your macro system, we may find a nested macro definition inside macro body - it means macro body should be parsed as well, and we should have a grammatic definition of its structure... you don't provide your grammar, but it would be really interesting to see the arrangements. Does your macro system allows defining macro body as just unstructured set of tokens? really interesting...

Keep good hacking and come back for more - I might get to implementing macro support some time soon. Although I'm not sure it would support nested definitions... :)
Thank you for adding your rating and review. I think your macro implementation is in fact a testimony that Irony is easy enough to undestand and even hack into!

Roman

Nov 7, 2009 at 11:15 PM
Edited Nov 7, 2009 at 11:18 PM

Back on the subject.

For the performance issue, i guess it's possible to combine two technique. A fast one, relying on the TokenFilter, when the token to expand is just a single var. Another one, more or less complex that inject source code.

The first version of my test grammar was mixing preprocessing and 2nd pass grammar (the real asm instruction "mov eax, edx"...etc.). But as you have noticed, i should be able to enter any text inside a macro, and this text is combined with macro statements.

Well, right now, i'm trying to implement a grammar with Irony, supporting unstructured tokens... but i'm stuck with reduce conflicts with the macro declaration. If i don't resolve this, i will post the full source (although this is just a grammar test).

To handle unstructured tokens, i have to keep all whitespaces analysed... (setting WhitespaceChars = "";) and i don't know if Irony is able to handle this correctly... (and i'm not sure to handle it correctly too.... damn it).

I'm trying to combine preprocess directives and unstructured tokens like this ( i tried to do something similar to ANTLR C preprocessor grammar file )

            Preprocess.Rule = PreprocessList;

PreprocessList.Rule = MakeStarRule(PreprocessList, null, ProcLine);

ProcLine.Rule = PreprocessDir | TokenLine;

Token.Rule = WS
| identifier
| OtherKeywords
| number
| LBR | RBR | LBP | RBP
| COMMA
| POINT
| COLON
;
Tokens.Rule = MakeStarRule(Tokens, null, Token);
TokenLine.Rule = Tokens + PreferShiftHere() + NL;

OtherKeywords.Rule = ToTerm("mov") | "inc" | "eax" | "ebx" | "ecx" | "end" | "jmp" | ".code" | ".model" | ".data";

PreprocessDir.Rule = MacroDir;

//macroDir
// id MACRO [[ macroParmList ]] ;;
//macroBody
//ENDM ;;
MacroDir.Rule = identifier + WSP + "MACRO" + MacroParmListOpt + WSS + NL
+ MacroBody
+ "ENDM" + WSS + NL;

WS is a whitespace. WSP is WS+, WSS is WS*.

I have 4 or 5 shift conflicts in the grammar and i get stuck on any line that is starting with an identifier and is not a macro. Irony is waiting for a "macro" keyword... i tried to put PreferShiftHere() hint just before the "macro" keyword, but it didn't do anything...

Have you ever tried to write a grammar that has to handle whitespaces explicitly?

About the asm macro, yep, they are more complex than c preprocessor directives, and they even allow to perform a double expansion in one time  : on the inline place and on the previous line. For example with the following macro declaration :

mymacro macro 
.data 
mylabel:    dd 0
.code
    mov eax, edx
    exitm <ecx>
endm

.code
   mov mymacro(), ebx

you get the following preprocessed output:

.code
.data
mylabel: dd 0
.code
mov eax, edx
mov ecx, ebx

The <exitm> directive is telling that the macro is a macro function, and the text between the <> is the inline text replacement, while the text in the macro body is inserted just before the line where the macro function is called.. So in terms of macro expansion, is not so simple than i expected in Irony... The particularity of the asm grammar is that it is strongly working on a line basis... With the unstructure tokens, i'm not sure to know how to declare this in Irony.

As for the macro expansion feature in Irony, i'm not sure you have to develop something in particular, or just some helper class (like you did for terminals) to help macro expansion work... although you could provide a standard C preprocessor as a separate grammar example in irony samples. But be careful to not consider c preprocessor as the only macro system around (although, i'm sure you don't!) . Think of the macro expansion like a templatizer system, and the templatizer could have a complex grammar to handle loop, repeat, conditionnals... on text expansion.

Oh, but one thing that you'll have probably to provide for any macro expansion system is a way to generate syntax errors with correct line number...

 

Coordinator
Nov 8, 2009 at 1:47 AM

One quick advice, "identity" rules might be a source of conflicts;  you have rule:

PreprocessDir.Rule = MacroDir;

when you say PreprocessorDir is the same as MacroDir.

Try removing PreprocessorDir and use MacroDir instead

 

Nov 10, 2009 at 1:23 PM
Edited Nov 10, 2009 at 1:46 PM

The remove of "identity" definition doesn't help. Here is the simple asm grammar i'm using (that contains many shift-reduce conflicts, due to whitespaces and identifier/macro definition) and the test that failed.

If you could have a look, it would be great!

 

using System;
using Irony.Parsing;

namespace IronAsm
{
    [Language("IronAsm", "1.0", "x86 MASM assembler")]
    public class AsmGrammar : Grammar
    {
        public AsmGrammar() : base(false)
        {
            GrammarComments = @"Simple MASM parser test.";

            // ----------------------------------------------------------------------------------------------------------
            // Tokens
            // ----------------------------------------------------------------------------------------------------------
            var NUMBER = new NumberLiteral("number");
            NUMBER.DefaultIntTypes = new TypeCode[] { TypeCode.Single, TypeCode.Int32, TypeCode.Int64 };

            var TEXT_LITERAL = new StringLiteral("String", "<", ">", StringFlags.AllowsAllEscapes);

            var IDENTIFIER = new IdentifierTerminal("Identifier", "_$?@", "_$?@");

            var COMMA = ToTerm(",","Comma");

            var POINT = ToTerm(".", "Point");

            var WS = new NonTerminal("Whitespace");
            WS.Rule = ToTerm(" ") | "\t";
            var WSP = WS.Plus();
            var WSS = WS.Star();

            var LBR = ToTerm("[");
            LBR.Options = TermOptions.IsOpenBrace;

            var RBR = ToTerm("]");
            RBR.Options = TermOptions.IsCloseBrace;

            var LBP = ToTerm("(");
            LBP.Options = TermOptions.IsOpenBrace;

            var RBP = ToTerm(")");
            RBP.Options = TermOptions.IsCloseBrace;
            
            var COLON = ToTerm(":", "Colon");

            // NewLine is explicitly handle in the grammar
            var NL = NewLine;

            // Comment
            var comment = new CommentTerminal("comment", ";", "\n", "\r", "\u2085", "\u2028", "\u2029");
            NonGrammarTerminals.Add(comment);

            // Set Whitespace no empty string. We handle whitspace in grammar to keep the unstructured tokens organized
            WhitespaceChars = "";

            // No delimiters (same for unstructured tokens)
            Delimiters = "";

            // ----------------------------------------------------------------------------------------------------------
            // Non Terminals
            // ----------------------------------------------------------------------------------------------------------

            //var PreprocessDir = new NonTerminal("PreprocessDir");
            // var TextLine = new NonTerminal("TextLine");
            var MacroDir = new NonTerminal("MacroDir");
            var MacroParmList = new NonTerminal("MacroParmList");
            var MacroParmListOpt = new NonTerminal("MacroParmListOpt");
            var MacroBody = new NonTerminal("MacroBody");
            var MacroParm = new NonTerminal("MacroParm");
            var MacroStmtList = new NonTerminal("MacroStmtList");
            var MacroStmt = new NonTerminal("MacroStmt");
            var ParmTypeOptional = new NonTerminal("ParmTypeOptional");
            var ParmType = new NonTerminal("ParmType");
            var Exitm = new NonTerminal("Exitm");
            var ExitmOptional = new NonTerminal("ExitmOptional");
            //var MacroCall = new NonTerminal("MacroCall");
            //var MacroArgList = new NonTerminal("MacroArgList");
            //var MacroArgListOptional = new NonTerminal("MacroArgListOptional");
            //var MacroArg = new NonTerminal("MacroArg");
            var NonMacroKeywords = new NonTerminal("OtherKeywords");
            var PreprocessLine = new NonTerminal("ProcLine");
            var PreprocessList = new NonTerminal("PreprocessList");
            var UnStructuredToken = new NonTerminal("UnStructuredToken");
            var UnStructuredTokens = new NonTerminal("UnStructuredTokens");
            var UnStructuredTokenLine = new NonTerminal("UnStructuredTokenLine");

            // ----------------------------------------------------------------------------------------------------------
            // BNF Rules
            // ----------------------------------------------------------------------------------------------------------
            PreprocessList.Rule = MakeStarRule(PreprocessList, null, PreprocessLine);

            PreprocessLine.Rule = MacroDir | UnStructuredTokenLine;

            UnStructuredToken.Rule = IDENTIFIER
                            | WS
                            | NonMacroKeywords
                            | NUMBER
                            | LBR | RBR | LBP | RBP
                            | COMMA
                            | POINT
                            | COLON
                            ;
            UnStructuredTokens.Rule = MakeStarRule(UnStructuredTokens, null, UnStructuredToken);

            UnStructuredTokenLine.Rule = UnStructuredTokens + NL;

            // Partial test list of keywords
            NonMacroKeywords.Rule = ToTerm("mov") | "inc" | "eax" | "ebx" | "ecx" | "end" | "jmp" | ".code" | ".model" | ".data";

            //macroDir
            // id MACRO [[ macroParmList ]] ;;
            //macroBody
            //ENDM ;;
            MacroDir.Rule = IDENTIFIER + WSP + PreferShiftHere() + "MACRO" + MacroParmListOpt + WSS + NL
                //+ MacroBody
                + MacroStmtList
                + "ENDM" + WSS + NL;

            MacroParmListOpt.Rule = Empty | (WSP + MacroParmList);

            //macroParmList
            //    macroParm
            //    | macroParmList , [[ NL ]] macroParm 
            var comma_decl = new NonTerminal("comma_decl");
            comma_decl.Rule = COMMA + WSS + (Empty | NL) + WSS;  
            MacroParmList.Rule = MakeStarRule(MacroParmList, comma_decl, MacroParm);

            // macroParm := id [[ : parmType ]]
            MacroParm.Rule = IDENTIFIER + ParmTypeOptional;
            ParmTypeOptional.Rule = Empty | (COLON + ParmType);

            //parmType
            //    REQ
            //    | = textLiteral
            //    | VARARG
            ParmType.Rule = ToTerm("REQ")
                | "=" + TEXT_LITERAL
                | "VARARG";

            //macroBody
            //    [[ localList ]]
            //    macroStmtList 
            //MacroBody.Rule = MacroStmtList;

            //macroStmtList
            //    macroStmt ;;
            //    | macroStmtList macroStmt ;; 
            MacroStmtList.Rule = MakeStarRule(MacroStmtList, null, MacroStmt);

            // Uncomplete macroStmt (just using macroDir and exitM)
            //macroStmt
            //  directive
            //  | exitmDir
            //  | : macroLabel
            //  | GOTO macroLabel
            MacroStmt.Rule = MacroDir
                | Exitm
                | UnStructuredTokenLine;
            
            //exitmDir:
            //  EXITM
            //  | EXITM textItem
            //RegexBasedTerminal regBasedTerminal = new RegexBasedTerminal("exitm-content","<.*>", "<");

            Exitm.Rule = ToTerm("EXITM") + PreferShiftHere() + ExitmOptional + NL;

            ExitmOptional.Rule = Empty | WSS + TEXT_LITERAL;

            ////macroCall
            ////    id macroArgList ;;
            ////    | id ( macroArgList )
            //MacroCall.Rule = IDENTIFIER + PreferShiftHere() + MacroArgListOptional;

            //MacroArgListOptional.Rule = "(" + MacroArgList + ")" |  WSP + MacroArgList | Empty;

            //var comma_arg = new NonTerminal("comma_nl");
            //comma_arg.Rule = COMMA + WSS;
            //MacroArgList.Rule = MakeStarRule(MacroArgList, comma_arg, MacroArg);

            //MacroArg.Rule = (NUMBER | TEXT_LITERAL | IDENTIFIER) + WSS;

            Root = PreprocessList;       // Set grammar root

            //MarkTransient(ParmTypeOptional, MacroArgListOptional);

            //automatically add NL before EOF so that our BNF rules work correctly when there's no final line break in source
            LanguageFlags = LanguageFlags.NewLineBeforeEOF;
        }
    }

}

 

I'm trying to parse the following text with Irony Grammar Explorer :

 

mymacro macro
bla bla bla bla bla
endm
mymacro
blo blo
end

and it fails on line 2. The grammar expect to have the first "bla" followed by "macro".

 

I'm probably using Irony in a wrong way...

([Edit]I remmenber that i had to modify the StringLiteral class constructor to accept both start and end tokens[/Edit])

Coordinator
Nov 10, 2009 at 5:06 PM

My first question: why you need to clear automatic whitespace chars in Grammar and handle whitespaces explicitly in grammar rules? I don't think it's really necessary,and it actually messes your grammar a lot.

My guess you wanted to process macro body lines as unstructured stream of tokens/whitespaces. I think you'd be much better if you put auto whitspaces back, and handle macro body lines using FreeTextLiteral, with termination symbol LF.

var macroLine = FreeTextLiteral("macroLine", "\n");

next, you should declare "macro" and "endm" as reserved words using MarkReservedWords call, so when scanner finds "endm" at the end of the macro, it will inambiguously scan it as keyword, not as another macro body line.

My guess this switch will eliminate several of your conflicts involving whitespace.

Second, your main shift-reduce conflict, in state S0. Basically the nature of the conflict is the following. When parser sees the first identifier token "a", it must make a decision immediately, what line it is:

*a b x y              -- unstructered token list

*a macro x y     -- macro header

("*" indicates current parser reading position). The way your grammar is structured forces parser either to create empty UnstructuredTokens nonterminal (for option 1) , or shift over macro name for macro header. Default action in conflict is shift, so it always chooses version #2, that's why it fails for you with "expected 'macro' " error message.

I think you can fix it by restructuring your grammar a bit. As far as I remember ASM syntax, regular commands start with instruction code like "mov", "add", "sub", etc

So do something like:

asmCommand.Rule = macroDef | command;

command.Rule = Instr + paramList;

Instr.Rule = "mov" | "add" | "sub"... etc;

macroDef.Rule = identifier + "macro" + idList + NewLine + macroBody + "endMacro";

macroBody.Rule = MakeStarRule(macroBody, macroLine); //macro line declared above

There is still a problem - nested macro defs, how to properly recognize endm  as end of parent, not nested macro. But lets leave it for now, lets make it work without nested macro

Roman

Nov 10, 2009 at 8:12 PM
rivantsov wrote:

My first question: why you need to clear automatic whitespace chars in Grammar and handle whitespaces explicitly in grammar rules? I don't think it's really necessary,and it actually messes your grammar a lot.

My guess you wanted to process macro body lines as unstructured stream of tokens/whitespaces. I think you'd be much better if you put auto whitspaces back, and handle macro body lines using FreeTextLiteral, with termination symbol LF.

You are probably right. I wanted to use whitespace in order to reconstruct the exact output for unstructured tokens.

I was thinking to implement the assembler in 2 step:

  • A 1st step using macros preprocessing (putting only asm in keywords without analysing them), that doesn't know anything about asm language. This 1st step would interprets the macro, perform expansion and output the results, reconstructing the source with unstructured tokens (with correct whitespace)
  • A 2nd step parsing the asm language without any macro language in it (they were stripped from 1st step).

Good point for FreeTextLiteral, although i still need to parse statements inside a macro... and it seems not possible (yet) to mix unstructured and structured lines in Irony? (at the condition that an unstructured line doesn't match a valid structured line...)

rivantsov wrote:

Second, your main shift-reduce conflict, in state S0. Basically the nature of the conflict is the following. When parser sees the first identifier token "a", it must make a decision immediately, what line it is:

*a b x y              -- unstructered token list

*a macro x y     -- macro header

("*" indicates current parser reading position). The way your grammar is structured forces parser either to create empty UnstructuredTokens nonterminal (for option 1) , or shift over macro name for macro header. Default action in conflict is shift, so it always chooses version #2, that's why it fails for you with "expected 'macro' " error message.

This is something i don't fully understand, as i thought Irony was able to handle it with some kind of lookahead?

In ANTLR, there is a feature called syntactic predicates that follow a rule only if a group of tokens are found.

the "a b x y" vs "a macro x y" is a common case that should be easy for a lookahead no?

for example, how would you implement the distinction in Irony between a declaration of a C function and it's implementation ?

  • declaration : void myfunction(int param1);
  • implem: void myfunction(int param1) { .... }

Anyway, don't worry too much. I'm just evaluating the feasibility of such an assembler written in Irony... I don't actually expect to develop such an assembler, but if Irony was helping a lot on that, that could change my mind (or another mind! ;) )

 

Coordinator
Nov 11, 2009 at 3:33 AM

"..Good point for FreeTextLiteral, although i still need to parse statements inside a macro... and it seems not possible (yet) to mix unstructured and structured lines in Irony? (at the condition that an unstructured line doesn't match a valid structured line...) "

Well, for me the first problem is actually define formally how the macro body should be parsed; so far I don't have a clear picture honestly. Only after that we'll try to express these rules in Irony.

"... This is something i don't fully understand, as i thought Irony was able to handle it with some kind of lookahead? In ANTLR, there is a feature called syntactic predicates that follow a rule only if a group of tokens are found. the "a b x y" vs "a macro x y" is a common case that should be easy for a lookahead no?... "

First of all Antlr is build on quite different algorithmic foundation, it uses top-down (LL) parser, while Irony follows bottom-up (LALR) algorithm. So out-of-the-box abilities of the algorithms are different. Generally, LALR algorithms are considered less restrictive, faster and overall preferable to LL. As one example,  LALR has not problem with Left-recursive rules, while LL cannot handle them at all, you have to refactor the grammar to get rid of these. As for using lookaheads, both approaches handle a single lookahead token as a facility of algorithm itself. Bigger lookaheads if needed are usually handled as custom code hooked into Parser that kicks in at point of indecision, looks ahead in token stream and gives parser an advice. There is such facility in Irony, look at c# sample. there is a custom lookahead code there that helps parser decide what is "<" in input - comparison operator or opening bracket for type parameter. In your case very similar code can do the same for looking up "macro" keyword in the stream.

Anyway, good luck with your research, let me know if you need anything else

Roman

 

 

 

Nov 11, 2009 at 10:00 AM
Edited Nov 11, 2009 at 10:13 AM
rivantsov wrote:

"..Good point for FreeTextLiteral, although i still need to parse statements inside a macro... and it seems not possible (yet) to mix unstructured and structured lines in Irony? (at the condition that an unstructured line doesn't match a valid structured line...) "

Well, for me the first problem is actually define formally how the macro body should be parsed; so far I don't have a clear picture honestly. Only after that we'll try to express these rules in Irony.

My fault, you are right. Trying to quickly prototype without giving you a kind of "whole picture" and asking why is not working with Irony is not fair! Moreover, i'm discovering that the macro body is only parsed when expansion occurs. It means that you were right when you advise me to use FreeTextLiteral.

ASM parsing is simple in many ways but assemblers have also a lots of weird syntax behaviour... instead of working on a MASM syntax, i should more probably develop an easier syntax to parse... i'll see that option too.

rivantsov wrote:

First of all Antlr is build on quite different algorithmic foundation, it uses top-down (LL) parser, while Irony follows bottom-up (LALR) algorithm. So out-of-the-box abilities of the algorithms are different. Generally, LALR algorithms are considered less restrictive, faster and overall preferable to LL. As one example,  LALR has not problem with Left-recursive rules, while LL cannot handle them at all, you have to refactor the grammar to get rid of these. As for using lookaheads, both approaches handle a single lookahead token as a facility of algorithm itself. Bigger lookaheads if needed are usually handled as custom code hooked into Parser that kicks in at point of indecision, looks ahead in token stream and gives parser an advice. There is such facility in Irony, look at c# sample. there is a custom lookahead code there that helps parser decide what is "<" in input - comparison operator or opening bracket for type parameter. In your case very similar code can do the same for looking up "macro" keyword in the stream.

Anyway, good luck with your research, let me know if you need anything else

Thanks! I missed the ResolveCode() in C# grammar. I have tested in the asm grammar and it's working great, even with unstructured token lines. So this ResolveCode() + On OnResolvingConflict() resolves the thing!

About LALR(1) and the "one token ahead", you are saying that "macro" in "Identifier + macro" is not considered as a one token ahead of "Identifier"? Or the token ahead is counting the current token? (sorry for this dummy question, I'have not played with parser for a while!)

The ResolveCode() + OnResolvingConflict() is really helpful and easy to use but i'm wondering if it would be relevant for Irony to provide a builtin simple disambiguation for one token ahead (or two, if the one token ahead is the current token)?  But ok, that's not a priority. Hope that you will be able to push a stable 1.0 before the end of the year.

 

 

Coordinator
Nov 14, 2009 at 3:38 PM

Hi

"one token ahead", for "macro" in Identifier + macro - yes, you interpret it right, macro is not considered this one token lookahead when parser in the position before identifier, it is identifier itself. The trouble is that at this point Parser must make a decision between 2 or more alternatives, and all it sees is CurrentToken (identifier), which IS the single lookahead token.

About built-in automatic lookup facility for cases like this - I actually thought about this, and it may be possible, but I didnt' figure out details yet. I may come back to this in the future

Roman