newbie question

Nov 20, 2009 at 6:32 AM

Hi

First sorry for the simple question. I am new to Irony and i want to build a parser for ASL(ACPI Source Language).

Here is a snippet of ASL name and pathname terms definition.

LeadNameChar := ‘A’-‘Z’ | ‘a’-‘z’ | ‘_’
DigitChar := ‘0’-‘9’
NameChar := DigitChar | LeadNameChar
RootChar := ‘\’
ParentPrefixChar := ‘^’
PathSeparatorChar := ‘.’
CommaChar := ‘,’
SemicolonDelimiter := Nothing | ‘;’

NameSeg := <LeadNameChar>
     | <LeadNameChar NameChar>
     | <LeadNameChar NameChar NameChar>
     | <LeadNameChar NameChar NameChar NameChar>
     
NameString := <RootChar NamePath> | <ParentPrefixChar PrefixPath NamePath> | NonEmptyNamePath
NamePath := Nothing | <NameSeg NamePathTail>
NamePathTail := Nothing | <PathSeparatorChar NameSeg NamePathTail>
NonEmptyNamePath := NameSeg | <NameSeg NamePathTail>
PrefixPath := Nothing | <ParentPrefixChar PrefixPath>

I defines the rootchar,commachar and etc as:
KeyTerm commaChar = ToTerm(",", "commaChar");
KeyTerm parentPrefixChar = ToTerm("^", "parentPrefixChar");
KeyTerm pathSeperateorChar = ToTerm(".", "pathSeperateorChar");
KeyTerm rootChar = ToTerm("\\","rootChar");

Am i right? And i don't know how to define LeadNameChar and DigitChar and so as NameSeg, Will anybody help me?

Coordinator
Nov 20, 2009 at 5:09 PM

Well, you should first be aware of two-step parsing process:

1. Scanning - recognizing chunks of chars like numbers, identifiers, keywords and combining them into tokens. This is also called lexical analysis, and rules for this operation are lexical rules.

2. Parsing itself - reading token stream from scanner and recognizing language constructs like expressions, statements, etc.  This is syntax analysis.

When defining a grammar for a language, language spec often describes both lexical rules and syntax rules in the same manner. However, for language impelementor it is important to separate these, and identify which are lexical rules and what are syntax rules (rules over tokens). It is true for any compiler tool, and important for Irony because lexical rules are supported internally in different way. What you listed are lexical rules, rules over chars for combining them into tokens. These rules are implemented in Irony through predefined Terminal classes. So what you need to do is identify appropriate terminal classes (IdentifierTerminal, NumberLiteral, etc), instantiate them as local variables and set various options on them that match your lexical rules.

I see that you probably would need IdentifierTerminal, NumberLiteral, maybe some others.

Try to understand what are restrictions on terminals expressed by your lexical rules and set the options of Irony terminals appropriately. Follow the examples in Irony.Samples grammar, and also in TerminalFactory methods

 

Nov 27, 2009 at 7:45 AM
Edited Nov 27, 2009 at 7:48 AM

Hi Roman

Thanks for guide.

I check IdentifierTerminal sourcecode and also some examples , but I still don't know how to define NameSeg.  It should works like c# identifier but has limited length (max 4 chars).  Then I find the RegexBasedTerminal class and define NameSeg as below

RegexBasedTerminal NameSeg = new RegexBasedTerminal("\\b[a-zA-Z_][\\w_]{0,3}\\b","");

It compiles well. However, when I use my grammer to parse some asl files in the grammarexplorer, the explorer keeps consuming memory until halt my PC and can't generate the parse tree. So, it there any bug in RegexBasedTerminal?

If I use c# identifier as NameSeg

var NameSeg = TerminalFactory.CreateCSharpIdentifier("NameSeg");

My grammar works fine and can generate the right parse tree for right asl file. However, it can not find error Namesegs with more then 4 chars.

Need your advise on how to fix my problem. Thanks.

 

Coordinator
Nov 29, 2009 at 6:01 PM

I think you should go with IdentifierTerminal. To reject names with length > 4, you should use ValidateToken event. If the name is invalid, put the error token in place of current token.

Like this:

 

//inside constructor 
identifier.ValidateToken += identifier_ValidateToken;
...
 private void identifier_ValidateToken(object sender, ParsingEventArgs e) {
   if (e.Context.CurrentToken.ValueString.Length > 4)
  e.Context.CurrentToken = e.Context.Source.CreateErrorToken("Identifier cannot be longer than 4 characters"); 
 }
Dec 2, 2009 at 8:48 AM

Hi Roman,

Sorry that I still can't work with your code. Looks like there is no definition of "ParsingEventArgs" .

If I changed it with ValidateTokenEventArgs, then there is no CurrentToken definition.

Need your kindly advise. Thanks.

Ak

 

Coordinator
Dec 2, 2009 at 3:49 PM

Looks like you need to take the latest source version from Sources page.

Roman