Resolving Identifiers and function call names to different terminals

Oct 25, 2012 at 6:46 PM
Edited Oct 25, 2012 at 6:47 PM

Hi,

I'm quite new to this so please bear with me.  I'm trying to extend the ExpressionEvaluatorGrammar to a language that I'm trying to parse.

The main thing I want to be able to do is create a different AstNode when I hit a function name than when I hit a standard Identifier.  In the expression grammar it creates an IdentifierNode in both cases. 

I thought to create another terminal which inherits off IdentifierTerminal, but creates a different AstNode.  I then tried to replace

FunctionCall.Rule = Expr + PreferShiftHere() + "(" ArgList + ")";

with

FunctionCall.Rule = testTerminal + PreferShiftHere() + "(" + ArgList + ")";

however this causes the parser to seem to try to match all identifiers as function calls and complains that a '(' is missing after every identifier. [syntax error, expected: ( ]

I can see that it doesn't know how to decide which one to start parsing, but I can't work out how to use the hints to fix it up.

Can you provide any help?  (I'm sorry not to paste the grammer but windows clipboard has died on me)  The rest of the grammar is exactly the same as the expressionevaluator example.

 

Oct 25, 2012 at 7:27 PM
Edited Oct 25, 2012 at 7:32 PM

It is better to attache the AST-node creation to the NonTerminals. I was surprised to see that the rule for function calls was this:

FunctionCall.Rule = Expr + PreferShiftHere() + "(" + ArgList + ")";

instead of

FunctionCall.Rule = identifier + PreferShiftHere() + "(" + ArgList + ")";
but in either case the AST node created for a FunctionCall is a FunctionCallNode.
var FunctionCall = new NonTerminal("FunctionCall", typeof(FunctionCallNode));
Oct 25, 2012 at 8:21 PM
Thanks for the amazingly quick response.

I was surprised at that too but either seems to work, and I didn't want to
confuse things in my example.

The underlying problem is that when I evaluate, I visit all the
astnodes and extract
all the identifiernodes to be treated as variables. I then evaluate the
statementnodes in variable dependency order.

You are right that the parent node is a functionnode, but the first child
is an identifiernode containing the function name. I need to ignore this.

I can do this with some custom logic, but I liked the neatness of handling
it in the grammar and since I'm trying to learn more of Irony (which is
pretty amazing btw), I thought it was a good opportunity.
I looked at shiftif but didn't really get it in the context of what I'm
trying to do.

I guess I could also create a custom terminal based on the
identifier terminal with a suffix of (. That way the tryparse would give a
fail for normal identifiers. Seemed a bit messy though.


> On Oct 25, 2012 8:27 PM, "pgeerkens" <notifications@codeplex.com> wrote:
>>
>> From: pgeerkens
>>
>> It is better to attache the AST-node creation to the NonTerminals. I was
>> surprised to see that the rule for function calls was this:
>>
>> FunctionCall.Rule = Expr + PreferShiftHere() + "(" + ArgList + ")";
>>
>> instead of
>>
>> FunctionCall.Rule = identifier + PreferShiftHere() + "(" + ArgList + ")";
>>
>> but in either case the AST node created for a FunctionCall is a
>> FunctionCallNode.
>>
>> Read the full discussion online.
>>
>> To add a post to this discussion, reply to this email
>> ([email removed])
>>
>> To start a new discussion for this project, email
>> [email removed]
>>
>> You are receiving this email because you subscribed to this discussion on
>> CodePlex. You can unsubscribe on CodePlex.com.
>>
>> Please note: Images and attachments will be removed from emails. Any posts
>> to this discussion will also be available online at CodePlex.com
Oct 25, 2012 at 8:36 PM

Let's say you make the change to FunctionCall.Rule (replacing Expr with identifier). Now, when walking the AST tree, there are 6 node types that can have Identifier as a child. In 5 of those identifier is a variable or member reference, and in the last it is a FunctionCall reference. Doesn't that work?

Coordinator
Oct 25, 2012 at 8:51 PM

I think you're a bit confused - at least, in a scripting language like expr evaluator that's not the way it works. You want to pin down an identifier as a function call at the moment of parsing - and assign appropriate node. The reality is that it's not the identifier itself that is the function: identifier is a reference to a named slot (memory location) in local or enclosing scope. It resolves to 'function call' only at runtime. For this, the interpreter reaches the identifier, pulls the the value of the variable - it might be anything (a number, a function ref, or null), then it tries to perform a 'function call' invocation against the 'value' of the variable. The function itself (Ast node containing function body) sits at the target, and 'name' in function call references this target node through a variable value. ExprEvaluator runtime simply defines all global functions as named global variables, just like any other 'x' or 'y' your script may define.

So basically, identifier is always a reference to a named runtime value - parser does not have to and does not need to assume anything else about it. Its AST node knows how to fetch the value (be it some number, string, or function ref). No need to put special FunctionCall AST into these identifiers.

If you want to make some sort of 'static' function binding, then the best way to do this is after parsing, by analyzing (visiting) the AST tree and rewriting/replacing some nodes. 

Roman

Oct 25, 2012 at 8:56 PM

Indeed. You are right. I had done something along those lines.

However, thinking ahead I wanted to be sure that if someone extended the grammar, it would be obvious that identifiers were only for use as variables.

As it stands the logic is hidden right down in my top level astnode node a long way from the problem.

In general, is there a solution when two terminals are an equally good match?

On Oct 25, 2012 9:36 PM, "pgeerkens" <notifications@codeplex.com> wrote:

From: pgeerkens

Let's say you make the change to FunctionCall.Rule (replacing Expr with identifier). Now, when walking the AST tree, there are 6 node types that can have Identifier as a child. In 5 of those identifier is a variable or member reference, and in the last it is a FunctionCall reference. Doesn't that work?

Read the full discussion online.

To add a post to this discussion, reply to this email (irony@discussions.codeplex.com)

To start a new discussion for this project, email irony@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com

Oct 25, 2012 at 9:20 PM

You are confusing the tokens identified by the Scanner with the Terminals and NonTerminals identified by the Parser and the AST nodes it builds. The definition of the token identifer is always going to be shared beween the names of variables, members, and functions, unless the grammar changes fundamentally, which is not going to happen from an enhancement.

The evaluation functionality for a FunctionCall is always going to be rooted int the AST node FunctionCallNode.  I don't see the issue you are concerned with.

Oct 25, 2012 at 9:23 PM
Roman,

I see what you're saying. I'm sure am confused it's a steep learning
curve for me.

Just to explain what I am doing. I have a language which is very like
the expression evaluator. A sample script:

b=a
a=1
c=sin(2.2)

Every line must be an assignment. When I evaluate, I create a list of
dependencies between statements and evaluate in an order that will
resolve. In the above I evaluate a=1 followed by b=a, followed by
c=sin(2.2) To do this I scan the statement nodes and extract all the
variables by finding the identifiernodes on the rhs. I then create a
dependency tree for each line and run it from the bottom up.

When a script contains a function call, I get an extra variable on the
dependency tree for c which is 'sin'. I can filter this out during
evaluation by looking at the parent, but thought it would be neat to
do this at the grammar stage by identifying it differently. It seemed
like a slightly different thing to me.

It sounds like it's best I stick with what I've got working right now.
As I said, I was just keen to know if there was a 'right' way to do
it, and indeed whether I can even do what I'm suggesting, if only for
the purposes of learning more about writing grammars.

I haven't looked at the code since I thought of this, but the Keyterm
terminal must have exactly this problem, mustn't it? Is this treated
as a special case, or handled generically with some grammar construct.

Thanks

On 25 October 2012 21:56, David Carr <davecarrs@gmail.com> wrote:
> Indeed. You are right. I had done something along those lines.
>
> However, thinking ahead I wanted to be sure that if someone extended the
> grammar, it would be obvious that identifiers were only for use as
> variables.
>
> As it stands the logic is hidden right down in my top level astnode node a
> long way from the problem.
>
> In general, is there a solution when two terminals are an equally good
> match?
>
> On Oct 25, 2012 9:36 PM, "pgeerkens" <notifications@codeplex.com> wrote:
>>
>> From: pgeerkens
>>
>> Let's say you make the change to FunctionCall.Rule (replacing Expr with
>> identifier). Now, when walking the AST tree, there are 6 node types that can
>> have Identifier as a child. In 5 of those identifier is a variable or member
>> reference, and in the last it is a FunctionCall reference. Doesn't that
>> work?
>>
>> Read the full discussion online.
>>
>> To add a post to this discussion, reply to this email
>> ([email removed])
>>
>> To start a new discussion for this project, email
>> [email removed]
>>
>> You are receiving this email because you subscribed to this discussion on
>> CodePlex. You can unsubscribe on CodePlex.com.
>>
>> Please note: Images and attachments will be removed from emails. Any posts
>> to this discussion will also be available online at CodePlex.com
Oct 25, 2012 at 9:48 PM

A pea doesn't know what type of pod it is in, just it's own shape: spherical or ellipsoidal. The pod grows to accomodate the type of peas it contains.

Work top-down instead of bottom-up, and much context will be immediately at hand.

Oct 26, 2012 at 1:45 PM
Edited Oct 26, 2012 at 1:47 PM

I am working top down in my evaluation and that works for me.  Sorry - I seem to have got away from the original question, which is kind of independent of my problem, and I'm sure is pretty trivial to solve.  I've created and example to make it clear.  Say I have the grammar below

      var identifier = new IdentifierTerminal("identifier");
      var letters = new LettersOnlyTerminal("test letters"); //letters a-z only

      var Expr = new NonTerminal("Expr");
      var comma = ToTerm(",");
     
      var Term = new NonTerminal("Term");
      var ArgList = new NonTerminal("ArgList", typeof(ExpressionListNode));
      var FunctionCall = new NonTerminal("FunctionCall", typeof(FunctionCallNode));
      var ObjectRef = new NonTerminal("ObjectRef"); // foo, foo.bar or f['bar']
      var Statement = new NonTerminal("Statement");
      var Program = new NonTerminal("Program", typeof(StatementListNode));
      var AssignmentStmt = new NonTerminal("AssignmentStmt", typeof(AssignmentNode));

      Expr.Rule = Term;
      Term.Rule =  FunctionCall | identifier;
      ArgList.Rule = MakeStarRule(ArgList, comma, Expr);
      FunctionCall.Rule = letters+ PreferShiftHere() + "(" + ArgList + ")";
      AssignmentStmt.Rule = ObjectRef + ToTerm("=") + Expr;
      ObjectRef.Rule = identifier;
      Statement.Rule = AssignmentStmt | Empty;
      Program.Rule = MakePlusRule(Program, NewLine, Statement);

      this.Root = Program; 

Here the LettersOnlyTerminal is similar to an identifier, but more specific. It can only be letters a-z. Now, I've inserted it into the function call

because I (hypothetically) only want function calls in my language to contain letters. When I parse:

 

a=b
a=b()

I get an error because it thought b was an identifier in line two and didn't expect the brackets after it. (Syntax error, expected: [line break])

I know this is a trivial conflict problem and I should be able to work it out for myself, but it goes to the heart of my understanding. How do I rewrite my grammar so that it understands both. How does it prioritise 'identifier' over 'letters' in the first place?

Oct 26, 2012 at 9:09 PM

Why do you desire to over-constrain (meaning, in many senses, break) your language this way?  It is possible to build parsers for grammars with such overlapping token definitions, but it is to be avoided whenever possible because it is (usually) an artificial and unnecessary cxomplexity.  

Oct 26, 2012 at 9:41 PM

I just wondered if there was some built in way of handling it or some pattern of grammar building I could use which would avoid it.

When I look more carefully at some of the questions others have asked, I see it has been asked in other contexts.

Since this is only a theoretical problem which I can work around, I'll leave you in peace.

Thanks again

David

On Oct 26, 2012 10:09 PM, "pgeerkens" <notifications@codeplex.com> wrote:

From: pgeerkens

Why do you desire to over-constrain (meaning, in many senses, break) your language this way? It is possible to build parsers for grammars with such overlapping token definitions, but it is to be avoided whenever possible because it is (usually) an artificial and unnecessary cxomplexity.

Read the full discussion online.

To add a post to this discussion, reply to this email (irony@discussions.codeplex.com)

To start a new discussion for this project, email irony@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com