This project has moved and is read-only. For the latest updates, please go here.

Willing to Contribute? Here is a To-Do list for Release

July 07, 2011.
Hi everybody!
These are things that, in my opinion, need to be done before we declare the Release. Any help and contributions are welcome!
For a while, I will NOT be working on any of these items, or in this area - so feel free to pick any item and start hacking. I will be finishing Interpreter.
If you see something you would like to work on, make sure you are in contact with other people who may be working on the same item. Once you selected the project, find an existing discussion related to it. Either follow the link to discussion thread in the project item (if it is enabled), or search Discussions page for a thread about this item. If there's no existing thread, then start a new one - set the title to match the item title. Post a message to this discussion thread announcing your intent to work on some item, with a link to item's discussion.
I will update the link to discussion thread on this page.

Scanning and Terminals

P1: Create LineContinuationTerminal - DONE,
A terminal class to handle continuation symbols like "_" in VB, use it in GWBasic.
Note: MiniPython grammar in Samples uses CodeOutlineFilter that handles continuation symbol "\". But for simpler case like GwBasic it seems overcomplicated, better have a simple terminal just for this.

P2: Handling embedded documentation (Xml comments) (Discussion thread)
A terminal to handle Xml comments or other types of embedded documentation ("=begin/=end" in Ruby, DocLines in Python). The terminal (TokenFilter?) should assemble documentation block from multiple lines and put it into ParseTreeNode.Documentation field (to be added) of the subsequent parse tree node.

P3: Implement IncludeText support (Discussion thread)
Facility to handle "include" statement, when you include text directly into the source (not reference external module). See discussion

P4: Fancy string types (Discussion thread)
Either extend StringLiteral, or create a new terminal to handle the following:
  1. Lua: [[ ... ]] string type; note: fist newline is ignored. Also the start end pair might be with "=" added: [===[ .... ]===] ; same for comment --[[ ... ]]
  2. Ruby: '%' notation (
       myString = %&This is my String&
	%Q/This is the same as a double-quoted string./    # '%Q' is treated as double-quoted string  

Interesting case is '%w': %w(foo bar baz) is equivalent to ['foo', 'bar', 'baz'] - maybe a separate terminal for handling this?

P5: Handling implied "+" between string literals. (Discussion thread)
Ruby allows missing "+" operator in concatenating strings:
		myString = "Welcome " "to " "Ruby!"
		=> "Welcome to Ruby!" 

Add support for this to StringLiteral - new option flag; if it is set, the matching code tries to search beyond the ending quote and checks if it is followed by other literal; if yes - the string is joined in one token.

P6: Implement HereDoc terminal (Ruby) (Discussion thread)
Implement a terminal class to handle embedded text like HereDoc in Ruby (similar things in other languages?).
Ruby HereDoc element is treated as double-quoted string; '<<-' allows indenting ending tag; '<<' does not;
Note that Ruby allows starting multiple HereDocs on a single line
		puts <<BEGIN + "<--- middle --->\n" + <<END
		This is the beginning:
		And now it is over!
		# this equals this expression:
		puts "This is the beginning:\n<--- middle --->\nAnd now it is over!"

P7: Create reduced Ruby grammar (Discussion thread)
To test all these fancy terminals mentioned in previous sections (Ruby variations) - would be nice to have some reduced version of Ruby grammar - call it MiniRuby?

P8: Add RegEx validation pattern to Identifier terminal; support for specific values (Discussion thread)
Some languages have certain rules about identifiers; for ex, 'Oneletter+OneDigit' in GwBasic. Add a property to IdentifierTerminal, Regex expression, to validate the identifier.
On a similar theme: Scheme has a notion of "Peculiar identifier" in its syntax definition - identifier with a fixed list of values:
Peculiar identifier = +|-|...
See how this can be supported. Maybe Regex pattern listing all variations?

P9: Support for nested comments (Discussion thread)
Examples: {- and -} in Haskell, similar thing in Scheme with "'(...)" - commenting out blocks of code with nested parenthesis.
Add support for this to CommentTerminal.

P10: Support for "special" treatment of the first line in code file (Discussion thread)
Lua- first line in file ignored if it starts with #; similar in CSV files - the first line has field names
One way to implement: add a LanguageFlag, and if it is set, Scanner calls some special overridable method in Grammar: HandleFirstLine(....)

P11: Implement BackgroundTextTerminal (Discussion thread)
A special terminal for grabbing everything that does not fall into other terminals. Will be used in Wiki and template parsers.
Let's look at Wiki terminals as an example. It has a bunch of terminals with certain start symbols, like "*" for bold, etc.
Scanner select a terminal based on current input symbol, trying to match it to one of First symbols of registered terminals. If it fails to match to any, it should call Background terminal (this can be done probably by making it a "fallback" terminal - the one that does not declare any test symbols). Now the interesting stuff. The background terminal should "eat" the text until it hits any of the "First" symbols of other terminals. At this point it stops, produces token and yields control to Scanner - which will try to select a terminal again.
To be able to do this (stop at other's firsts), the background terminal must be able to "spy" on other terminals and grab all their "first" symbols at initialization.

P12: Terminal for handling inlined Xml (Discussion thread)
A terminal to scan inlined Xml (Scala).

Miscellaneous projects (more challenging and requiring more effort)

P21: Implement fast compiled delegate to create AST nodes DONE:
AstNodes are created by Parser using reflection, which is slow. Implement creating dynamic compiled delegates calling AST node constructors. For an example, see Vita project, EntityClassBuilder class, method InstanceCreatorMethod - creates compiled method to create entity instances. Plan: NonTerminal should have a property Creator - pointer to delegate creating the Ast instance. If it is null (default), then the method is built by ParserDataBuilder. Then this method is used by Parser for creating node instances. Place all static creator methods in one static class generated at runtime.
Note that there is already a method pointer for custom AST node creation. This new one is an addition, for the case when grammar provides AstNode type only.

P22: Switch to .NET big int, complex, rational in VS2010 (Discussion thread)
BigInteger, Complex are supported in .NET 4.0. Implementation of ration numbers (BigRational) is available here:
A good place to test is Scheme grammar - Scheme supports all these data types.

P23: Console window for Grammar Explorer (Discussion thread)
Create and integrate Console control in Grammar Explorer, to show interactive console directly inside in the form, instead of read-only Runtime Output window.

P24: Pretty print facility (Discussion thread)
Implement "pretty printer" - some way to output parse tree in a nice formatted form. The formatting options should be provided through some setup class (like indentation, etc)

P25: Specifying custom infix operator precedence and associativity on per-file basis. (Discussion thread)
Haskell allows this: infix, infixl, infixr methods in Haskell.

P26: Finish LINQ syntax implementation in c# grammar. (Discussion thread)
This is mostly to explore additional twist - scoped reserved words.
Extract from c# language spec, section 7.16.1:

Query expressions contain a number of “contextual keywords”, i.e., identifiers that have special meaning in a given context. Specifically these are from, where, join, on, equals, into, let, orderby, ascending, descending, select, group and by. In order to avoid ambiguities in query expressions caused by mixed use of these identifiers as keywords or simple names, these identifiers are considered keywords when occurring anywhere within a query expression.
For this purpose, a query expression is any expression that starts with “from identifier” followed by any token except “;”, “=” or “,”.

This introduces a concept of "scoped" reserved words: some keywords are reserved words (in Irony's meaning) only inside some scope/statement. Suggestions on implementation?

P27: Refactor/rebuild unit tests projects. (Discussion thread)
This is a big one. The current state is a shameful mess. Existing unit tests make quick shortcuts to instantiate terminals and call TryParse directly.
We need to implement more complete tests, with test grammars, systematically covering all aspects of functionality.

Advanced Research Projects

P31: Survey popular languages
Survey modern languages and identify which facilities are missing in Irony to build parsers for these languages (mainly terminals). Suggested languages to look at: Scheme, Python, Ruby, GwBasic, Lua, Tcl, JavaScript, Clojure, Scala, Boo,Haskell, Erlang, Groovy, Smalltalk, PHP, Go (new Google language).
All these are modern and popular languages - but why not include older ones - like Cobol? Even if we never write Cobol compiler, somebody may need to write colorizer for Cobol editor.

P32: Token Preview -based Semi-Automatic Conflict Resolution. (DONE)
This topic deserves its own page: Automatic Conflict Resolution.

P33: Macro facilities
See how to implement macro facilities, starting from plain c macros up to advanced stuff like Scheme Macro system.

P34: Template processor
Parser for template files, which include "text" and embedded script commands (Ruby rhtml format as an example).
Things to consider - embedded grammars or sub-grammars, a separate grammar attached to some terminal that will be used to parse sub-sections of the text.

P35: Basic code analysis algorithms
Implement some basic code analysis - "Lattices" and all that stuff. Maybe not for this release, but more long-term. The goal is to implement algorithms for detecting loops, uninitialized variables, unused code, etc. The most approachable introduction to the subject I found in "Advanced Compiler Design Implementation" by Steven Muchnick.

Last edited Aug 2, 2011 at 10:28 PM by rivantsov, version 25


pgeerkens Oct 13, 2012 at 9:37 PM 
re: P5: Handling implied "+" between string literals
This seems more naturally a grammatical construct caught by the Parser than a lexical one caught by the Scanner. Is there a particular reason why that is incorrect?