Language Service

Jan 28, 2008 at 9:26 PM
Has anyone tried creating a Managed Language Service using Irony?

My experience in this field is limited but recently I started integrating a language using Managed Babel. It uses Lex-like and Yacc-like syntax to create the parser and scanner files. However, I quickly relized that the Yacc file will become difficult to maintain and debug so started looking at alternatives. Using Irony might be the answer, but with Managed Babel everything was already set up for me. So I am struggling a little trying to figure out how to link my Irony grammer to a language service.
Coordinator
Jan 29, 2008 at 4:31 AM
Edited Jan 29, 2008 at 4:32 AM
Hi
I'm afraid there are no such examples - remember, the project is just 2 months old, it haven't been officially released yet.
I will definitely be looking into VS integration support; if you want a working example right now - look at IronPython integration example in VStudio SDK. For Irony specifically - nothing for now; sorry. The problem is that this integration involves quite a lot of code, implementing zillion of classes and interfaces required by VS. Will be great if somebody starts looking into this (for Irony) - wanna try?! I will provide all help I can, while I'm working on completing featureset for 1.0 release
thanks again
Roman

Jan 29, 2008 at 5:29 PM
Edited Jan 29, 2008 at 6:02 PM
Sure, I can give it a shot. I plan on starting with the scanner that is required by a language service.

public override IScanner GetScanner(Microsoft.VisualStudio.TextManager.Interop.IVsTextLines buffer)

Should I modify the existing scanner class for the compiler and create a new scanner class that implements IScanner? Or just create an entirely new class from scratch?

My plan is to modify the current scanner class in the compiler so that it has methods for reading tokens one at a time. Much like the tokenizer class works in IronPython's compiler.
Coordinator
Jan 29, 2008 at 7:36 PM
Edited Jan 29, 2008 at 7:37 PM
I think a good place to start is to implement Babel's grammar in Irony and make sure it parses correctly. As for where to place IScanner implementation - I wouldn't mind extending Scanner class with requiered methods, except one thing - making Irony's Scanner implement VS IScanner would require Irony assembly to directly reference VS integration assemblies, which are not part of .NET framework and standard .NET installation. So to compile Irony dev has to install VS SDK - which is not good. Instead, let's create integration assembly (Irony.Integration), define IScanner implementation class there - just some bridge class - that will redirect all calls to Irony's scanner. While you do Babel grammer, I'll take a close look at VS IScanner and come up with more details how we can proceed.
thanks again and good luck
Jan 29, 2008 at 9:52 PM
Good we are on the same page. Good idea on just using the Managed Babel Framework as a starting point. I have color highlighting working now. When I am finished modifying the framework to work with Irony what should I do with the code?

Just for your info I had to add the following methods to the scanner class:

 
      public void SetSource(CompilerContext context, SourceFile source)
      {
          _context = context;
          _source = source;
          _source.Reset();
      }
 
      public Token GetNext(ref int state, out int start, out int end)
      {
          start = _source.Position;
          Token token = ReadToken();
          end = _source.Position;
 
          return token;
      }
 

I can include my modified scanner class with the rest of the code once I finish.
Coordinator
Jan 30, 2008 at 5:32 AM
Edited Jan 30, 2008 at 5:34 AM
Hi again
I admit, I got totally confused - I thought Babel is a language you try to implement/integrate (I think I heard about smth like that), but it turns out it is the name of the framework for VS integration - now I get it. Then what is the language you implement? Did you get its grammar in Irony format/class? "color highlighting working" - you mean with Lex-style grammar following Babel guidelines? or already in Irony? If it's not a big secret, can you pls share what you have, even if it's not working yet? I have some trouble figuring out Babel stuff, so if you have something that would be a good place to start.
Note about the code you added. I think GetNext should be implemented as follows:

public Token GetNext(ref int state, out int start, out int end)
      {
          Token token = ReadToken();
          if (token == null) return null; //just in case; should never happen
          start = token.Location.Position; 
          end = _source.Position - 1;  
          return token;
      }

Note that between ReadToken calls current position is just after the end of the previous token; hence "-1" for end parameter. Also, there may be whitespace at this position which ReadToken will skip before starting actually constructing token, so start parameter should be calculated from token, not from source position before ReadToken.
I will be away for few days, so don't get alarmed if nobody answers for some time - I will be back. Send me stuff whenever you have something to rivantsov at gmail . In the meantime I will try to look at IScanne and how it should work. I think we will need to use/set the state parameter of GetNext as well.
good luck




Jan 30, 2008 at 1:02 PM
Actually the language I am integrating isn't very interesting because it is an older language that isn't very popular. It is called JOVIAL.

I will make the changes to GetNext(). Thanks for the feedback. Also, I agree about the state parameter. I am still unsure about its usage right now and that is why it is being ignored.

Basically the modifications that I have to do to the Managed Babel Framework are very minor, but I will email you what I have so far. It is very basic and doesn't really achieve anything yet but it lets me know that I am on the right track.
Jan 30, 2008 at 11:05 PM
Edited Jan 31, 2008 at 11:07 PM
I now have colorizing and syntax highlighting working.

Now I plan to do brace matching. The managed framework list two separate types: MatchDouble (e.g. { and }) and MatchTriple(e.g. foreach, {, and }). MatchDouble can be handled using the BraceMatchingFilter, but can it handle MatchTriple?

Also, I don't think it is currently possible to extract the brace locations after a parse is performed. Any suggestions on how I should modify the code to achieve this? I know the functionality already exists in BracematchingFilter.cs. The locations just need to be stored somewhere.
Feb 1, 2008 at 2:59 AM
Actually I think I got confused on the purpose of the BraceMatchingFilter. I wonder to achieve this functionality if a Match method should be added to the NonTerminal. So it would be defined something like this:

NonTerminal CodeBlock = new NonTerminal("CodeBlock");
CodeBlock.Rule = Function + "{" + BlockContent + "}";
CodeBlock.Match(2, 4);  //Match Double
CodeBlock.Match(1,2,4); //Match Triple

Feb 7, 2008 at 9:25 PM
I am having a hard time figuring out how to modify the code in Irony to include what I need to do for brace matching.

My plan resembles how it is handled in Yacc. So for the following rule in Yacc:

 
Block
    : OpenBlock CloseBlock                   { Match(@1, @2); }
    | OpenBlock BlockContent CloseBlock      { Match(@1, @3); }
    | OpenBlock BlockContent error           { CallHdlr("missing '}'", @3); }
    | OpenBlock error CloseBlock             { Match(@1, @3); }
    ;

I would like to do in Irony this way:
NonTerminal Block = new NonTerminal("block");
Block.Rule
   = OpenBlock + CloseBlock
   | OpenBlock + BlockContent + CloseBlock;
Block.Match(0, 0, 1);
Block.Match(1, 0, 2);

So I thought I would add this code to the NonTerminal class:

      private List<int[]> matches = null;
      public void Match(int ruleIndex, params int[] elementIndexes)
      {
          if (matches == null)
          {
              matches = new List<int[]>();
          }
          matches.Insert(ruleIndex, elementIndexes);
      }
 
      public List<int[]> Matches
      {
          get { return matches; }
      }

Then when I parse a file I will need the parser to build an array of Tokens containing the matched braces. The BraceMatchingFilter almost does this and could with a little modification. From your comments I gathered this class should only be used for languages that allow alternate braces to be used. So I am not sure how to modify the compiler to make this happen. Any help would be appreciated.
Coordinator
Feb 7, 2008 at 10:38 PM
What exactly do you need? what kind of method that is called by Babel that returns info about braces? I understand you need some kind of tracking of braces inside parser/scanner, but how this information will be exposed and accessed by Babel integration code?

Without knowing this "purpose", just guessing that you need the list of unclosed braces, we can look at Parser Stack (data buffer inside parser) - it contains all unclosed braces that were not yet reduced into higher-level structures.
Feb 7, 2008 at 11:15 PM
Sorry I guess if I should have mentioned that. Here is what Babel needs:

IList<TextSpan[]> Braces;
 
public struct TextSpan
{
  public int iEndIndex;
  public int iEndLine;  
  public int iStartIndex;
  public int iStartLine;
}

So for MatchDouble the array would contain two TextSpans and three for MatchTriple. TextSpan is in the VisualStudio namespace so I am not expecting to use that, something similar will do.
Coordinator
Feb 7, 2008 at 11:44 PM
ok, I get it. Braces property should be exposed by Parser and contain a list of all matching pairs after parsing a source file.
I will put this stuff into parser in the next code drop; for now add some workaround in Parser.
Feb 8, 2008 at 12:40 PM
Thanks. With this and some AST traversal methods (needed for Intellisense) I should be in good shape.
Sep 6, 2008 at 3:05 AM
Hey, guys, nice work on Irony. I have been using it for about two weeks and I easily built up a DSL for internal use within my organization. Love it. I am after a VS Language Service as well and would like to know where you guys are with your efforts to simplify the integration of Irony with a Language Service. Can I get an update? Can I help contribute?

Coordinator
Sep 8, 2008 at 3:59 PM
Hi
That had been a long time ago; bmorrison actually succeeded and sent me his code for language service. It was implemented as a separate assembly, with minimal support from Irony. I would like to give it another try, now with more built-in support in Irony's core classes. I think I have some idea about what is needed. Let's do it, contact me directly, we'll try to cook something quickly.
Roman  
Nov 24, 2008 at 2:31 AM
I am starting to work with Irony again and noticed the new EditorInfo property for tokens.  This will make integrating Irony with a language service very easy and was a great addition.  The problem I am currently having is I can't figure out how to get the TokenTriggers property to work for brace matching.  My grammer uses begin and end symbols instead of curly braces.  Because of this they are parsed as identifiers and not symbols.  This causes their TokenTriggers property to be set to none because it is not seeing the term options set for the symbol.  Is it possible to get Irony to parse begin or end as a symbol?
Coordinator
Nov 24, 2008 at 2:49 AM

As a quick fix that might work, try registering begin and end as keywords. Not sure but might work. I’ll have to look closer then

Roman

 

Nov 24, 2008 at 3:34 AM
I tried that and it shows up as a keyword, but no luck on having the TokenTriggers set properly.
Coordinator
Nov 24, 2008 at 3:49 AM

Try another thing – set Priority property on “begin” and “end” to 1 (higher than default 0):

 

Symbol(“begin”).Priority =1;

Symbol(“end”).Priority = 1;

Nov 24, 2008 at 5:30 AM
Ok that did it.  Thanks!  Care to explain why that worked?  Also, is there a way to get alll matched braces from parser now?
Coordinator
Nov 24, 2008 at 1:09 PM
About getting all matched braces. Currently Irony does not maintain any list of brace pairs. Brace pairs are identified by internal properties of symbol terminals themselves.
You can deduce it by running through Grammar.Terminals list and checking each element's brace-related options and properties (see Grammar.RegisterBracePair method for more info).
Why setting Priority helped in your case... Will try a short explanation. When Scanner looks at the next symbol in input - let's say it is "b", it looks up a list of terminals that can be candidates for matching the input for this symbol. In your case, that would be 2 terminals: Identifier, and "begin" symbol (which is a terminal as well). The scanner then runs through the list and tries to match the input - the first terminal that produces a token wins, and Scanner returns this token. For our two terminals, both can produce a valid token. By the way, for parser it does not matter which - it will match grammar rule correctly in either case. So which token will be produces depends on the ordering of terminals in the looked up list, and this is where Priority property is used. By default, all literal symbols in grammar rules (like "begin" in your case) get the lowest priority, so they are always the last in the list. This is why you were getting Identifier token previously. This rule makes sense as by default we allow "begin" to be an identifier - Parser will decide what it is later. When we changed the priority, we forced the "begin" terminal to come first in the list, so it will generate a "symbol" token which inherits all brace-related flags from the terminal.
Hope this gives some idea why it happens...
 
Nov 25, 2008 at 2:46 PM
Thanks for the explanation, it all makes sense now.  As for braces, we had a small discussion back on this in February and it seemed you were working on something to make this easier.  The terminal list doesn't give me the location of the braces in a parsed file, unless I am missing something.  If you look at my comment from February 7, that is what I would need. 
Coordinator
Nov 25, 2008 at 4:17 PM
I see your problem. Let me think about this, and I'll try to come up with quick solution. Most likely it would be using BraceMatchFilter, only the filter should do some extra work like putting into each opening brace token a link to matching closing brace token, and vice versa. Then you can quickly scan the token list (now available directly in Compiler) and get the matching pairs - we can even put such method into Compiler. I'll get back to you soon
Coordinator
Nov 25, 2008 at 8:18 PM
Actually you really can use the BraceMatchingFilter as is already - just add it in Grammar's constructor and save ref to it in some public field. Then after you parse you can get the list of brace pairs from it. It's a bit ugly now, will try to improve it soon.
Coordinator
Nov 25, 2008 at 10:11 PM
Just checked in an updated version with a bit improved support for brace-pairs lists (changeset 19018). Look at the commented code under Parse button -click handler in Grammar Explorer - it is a sample of how you can get the brace list after parsing. It is a list of open-brace tokens, but you can get matching closing brace from Token.OtherBrace property.
Hope this is what you need.
Nov 27, 2008 at 5:14 PM
Thanks for your help and the update.  That did the trick.  I have it working with my language service now. 
Jan 14, 2009 at 9:44 PM

Roman,

 

In my grammar, I need certain symbols to have their EditorInfo.Triggers property to include MemberSelect.  Currently, I don’t think this is possible looking at the source code.  I accomplished this by creating a TermOption called MemberSelect and this set EditorInfo appropriately.  Can you please include something to this effect in the official version?

Ben

Coordinator
Jan 15, 2009 at 3:24 PM
No problem, will add it.
Jan 15, 2009 at 8:32 PM
I also have another issue.  All symbols that I have defined with term options, I have also placed their priority to 1 so that they will be scanned as symbols not identifiers by the scanner.  The problem with this is if you have an identifier that begins with the symbol it will reduce the symbol first even though the identifier doesn't contain any spaces.  For example, if I have the symbol "END" and the language file contains an identifier ENDTest.  The scanner will produce an "END" symbol token and a "Test" identifier token.  If I don't change the priority all is well.  I think this might be a bug.
Coordinator
Jan 16, 2009 at 3:52 PM
Hi
I'm aware of the problem, and it already surfaced in other discussions. The proper solution should be actually to patch the token when it is matched "by value" by changing its term from identifier to Symbol. Here is the code of Parser.GetCurrentAction method:

    private ActionRecord GetCurrentAction() {
      ActionRecord action = null;
      if (_currentToken.MatchByValue) {
        string key = CurrentToken.Text;
        if (!_caseSensitive)
          key = key.ToLower();
        if (_currentState.Actions.TryGetValue(key, out action)) {
          _currentToken.SetTerm(SymbolTerminal.GetSymbol(key)); //new code
          return action;
        }
      }
      if (_currentToken.MatchByType && _currentState.Actions.TryGetValue(_currentToken.Terminal.Key, out action))
        return action;
      return null; //action not found
    }

In this case, when parser recognizes that token is in fact a special symbol, not identifier, it paches it, so all symbol's attibutes will get in place.
try using this version for now, I will post fixed version soon; i'm in the middle of big code refactoring right now. 
Sorry for the trouble.
  
Jan 16, 2009 at 5:49 PM
No problem I can apply the fix.  However, I need the code for SetTerm as well.  It isn't in the current version.  Thanks!
Coordinator
Jan 16, 2009 at 6:08 PM
oopss, sorry, thought I had it there already

    //Method used for backpatching symbol terminals; they are initially recognized as identifiers by scanner,
    // but if parser matches token value to literal symbol in grammar, it changes it to SymbolTerminal
    public virtual void SetTerm(BnfTerm value) {
      _term = value;
      Precedence = Term.Precedence;
    }

You might want to add some more copying code here, like some node flags. Not sure, I'm guessing here, I don't have the code working now
Jan 23, 2009 at 3:24 PM
I needed the symbol to be found by the scanner so I modified the ReadToken method as follows in bold:

//If we have normal token then return it
if (result != null && !result.IsError()) {   
  //restore position to point after the result token   
  _source.Position = _source.TokenStart.Position + result.Length;   
  result.SetTerm(SymbolTerminal.GetSymbol(result.Text));   
  return result;
}

Do you see any issues with this?  Could this be added to the configured version?
Coordinator
Jan 23, 2009 at 5:22 PM
Edited Jan 23, 2009 at 5:24 PM
That would be completely wrong thing to do. Look - you are making ALL tokens to be symbols, even Identifiers and numbers, and string literals! The trouble is that GetSymbol either returns existing symbol, or creates new one if it doesn't exist.

I understand that there's a lot of confusion already with this issue, it's my fault no doubt, and I will try to improve things in the future. Here's a quick summary of things how they should be. 
A language may have a list of Keywords that are used as special tokens inside the language: begin, end, void, for etc. These keywords are usually highlighted in smart editors. In some cases some of these kewords may be RESERVED - meaning that they cannot be used as variable names (identifiers). Other, non-reserved symbols CAN be used as identifiers. So scanner can reliably recognize only reserved words, and change token's term to symbol from identifier. But non-reserved words can be recognized as keyword/identifier only by Parser using the surrounding context.
In Irony currently we have only Keywords which is kinda both, and this is the source of troubles. I'm still thinking about how to fix it properly, most likely to introduce two collections for keywords and reserved words, and to fix reserved ones in scanner.
That's what you can do for now. In the code you suggest, you should probably match token text against reserved words, and change token's term to symbol only if it reserved word.
Jan 23, 2009 at 6:38 PM
I think I understand.  I added a reserved stringset along side of the keywords stringset in the grammar class.  So my change in the scanner now looks this way.

//If we have normal token then return it
if (result != null && !result.IsError()) {   
  //restore position to point after the result token   
  _source.Position = _source.TokenStart.Position + result.Length;    
  if(Data.Grammar.Reserved.Contains(result.Text))
  {
    result.SetTerm(SymbolTerminal.GetSymbol(result.Text));    
  }
  return result;
}
Feb 10, 2009 at 7:46 PM
I have posted an article on Code Project titled Writing your first Visual Studio Language Service.  It covers the basics of what I have been able to accomplish in integrating Irony and Visual Studio.  I will try and do a part 2 that will cover advanced intellisense(user defined types, referenced libraries, etc.) and traversing the AST.
Coordinator
Feb 11, 2009 at 5:53 PM
This is great news! Thank you for sharing this!