Wednesday, October 22, 2008

FreeCC, the modern JavaCC

If you ever needed to write a parser, you hopefully didn't do it by hand but rather used a parser generator. The two predominant parser generators in Java are ANTLR and JavaCC. JavaCC project however suffers from paralysis for too many years now. The original developers aren't present for too many years now, and the current set of active committers didn't make any significant improvements for years. My FreeMarker colleague Jon Revusky has recently taken interest in it (FreeMarker having used JavaCC in the last six years, when Jon ditched the hand-written parser of the 1.x versions). Since his improvements to the JavaCC codebase weren't accepted by mainstream (largely dormant) JavaCC project, he did the only correct move: he forked it.

Enter FreeCC.

FreeCC is backwards compatible on source level with JavaCC - it can generate parsers from existing JavaCC sources, and the parsers behave identically (since it's a continuation of the original codebase with different name, it's not too surprising, but doing the work under fresh name, it doesn't hurt to emphasize), FreeCC, however has some modern new amenities, such as:

  • typesafe collections in its own code, instead of Java 1.1 Vector and Hashtable

  • instead of hardcoded prints in the source code, the parsers are generated using editable templates (can you say "multiple target languages support?")

  • other long overdue source code sanitization

  • generated parsers don't rely on static singletons anymore

These are all nice and dandy; code reworks in particular improve the project's long term maintainability by making its comprehension more easy for newly joining developers (in this regard, FreeCC is much more welcoming than JavaCC for a newbie hacker). These however don't really give you much of an advantage when you're writing your grammars. The following ones do, and they're the real kickers:

  • code injection feature eliminates (or at least strongly reduces) the need for manual post-editing of files

  • grammar include feature, which allow smaller grammar files to be reused in larger ones (with JavaCC, you had to copy-paste), as well as allow you to organize larger grammars into separate smaller files.

These features are also the sign of what kind of features are still to come: features that provide you with modern conveniences a programmer in a need of a parser generator would wish to have. These two new features above help you create maintainable grammar sources, and they are just the start. If you adopt these features however, there's no going back. Since JavaCC doesn't have these features, your grammar files will no longer be compatible with JavaCC, only with FreeCC. But you'll hardly want to go back. FreeCC has taken this particular parser generator codebase much further in its few months of existence than the JavaCC project did in years, and is gaining traction. Given the fact that FreeCC is a continuation of the current JavaCC codebase (which didn't really progress further since the fork), it is really risk-free to try it out for your next project (or even in current project!) instead of JavaCC. You can also expect that the developer will be more open to your feature requests, as Jon has a good track record of listening to community wishes in FreeMarker.

FreeCC is the JavaCC you'd want to use in 21st century.

At FreeMarker project, we've already switched to FreeCC; FreeMarker 2.4.0 and 2.3.15 will both have a parser built using FreeCC. Since Jon also works on FreeMarker, he's truly eating his own dog food. (Actually, there's even more to that. Since FreeCC in turn uses FreeMarker as the templating system for its output, he's eating his own dog food doubly! This seemingly creates a circular dependency between FreeMarker and FreeCC, except that luckily all that was needed was a FreeMarker JAR with parser built still using JavaCC to bootstrap the process initially; the projects are self-sufficient since and don't need JavaCC anymore.)

1 comment:

Anonymous said...

JavaCC definitely needs an upgrade, but this goes in the wrong direction. The main thing is that JavaCC needs to get rid of the 1999-era coding style and start using interfaces so we can do our own underlying implementations. Token, for instance, is still a concrete class that exposes instance variables without getters and setters! It is astounding that we still do this in 2008. In my brief look at FreeCC, it seems that the problem gets even worse. Token not only preserves the bad stuff, it also adds two dozen unnecessary methods.