Wednesday, October 22, 2008

FreeCC, the modern JavaCC

If you ever needed to write a parser, you hopefully didn't do it by hand but rather used a parser generator. The two predominant parser generators in Java are ANTLR and JavaCC. JavaCC project however suffers from paralysis for too many years now. The original developers aren't present for too many years now, and the current set of active committers didn't make any significant improvements for years. My FreeMarker colleague Jon Revusky has recently taken interest in it (FreeMarker having used JavaCC in the last six years, when Jon ditched the hand-written parser of the 1.x versions). Since his improvements to the JavaCC codebase weren't accepted by mainstream (largely dormant) JavaCC project, he did the only correct move: he forked it.

Enter FreeCC.

FreeCC is backwards compatible on source level with JavaCC - it can generate parsers from existing JavaCC sources, and the parsers behave identically (since it's a continuation of the original codebase with different name, it's not too surprising, but doing the work under fresh name, it doesn't hurt to emphasize), FreeCC, however has some modern new amenities, such as:


  • typesafe collections in its own code, instead of Java 1.1 Vector and Hashtable

  • instead of hardcoded prints in the source code, the parsers are generated using editable templates (can you say "multiple target languages support?")

  • other long overdue source code sanitization

  • generated parsers don't rely on static singletons anymore


These are all nice and dandy; code reworks in particular improve the project's long term maintainability by making its comprehension more easy for newly joining developers (in this regard, FreeCC is much more welcoming than JavaCC for a newbie hacker). These however don't really give you much of an advantage when you're writing your grammars. The following ones do, and they're the real kickers:

  • code injection feature eliminates (or at least strongly reduces) the need for manual post-editing of files

  • grammar include feature, which allow smaller grammar files to be reused in larger ones (with JavaCC, you had to copy-paste), as well as allow you to organize larger grammars into separate smaller files.


These features are also the sign of what kind of features are still to come: features that provide you with modern conveniences a programmer in a need of a parser generator would wish to have. These two new features above help you create maintainable grammar sources, and they are just the start. If you adopt these features however, there's no going back. Since JavaCC doesn't have these features, your grammar files will no longer be compatible with JavaCC, only with FreeCC. But you'll hardly want to go back. FreeCC has taken this particular parser generator codebase much further in its few months of existence than the JavaCC project did in years, and is gaining traction. Given the fact that FreeCC is a continuation of the current JavaCC codebase (which didn't really progress further since the fork), it is really risk-free to try it out for your next project (or even in current project!) instead of JavaCC. You can also expect that the developer will be more open to your feature requests, as Jon has a good track record of listening to community wishes in FreeMarker.

FreeCC is the JavaCC you'd want to use in 21st century.

At FreeMarker project, we've already switched to FreeCC; FreeMarker 2.4.0 and 2.3.15 will both have a parser built using FreeCC. Since Jon also works on FreeMarker, he's truly eating his own dog food. (Actually, there's even more to that. Since FreeCC in turn uses FreeMarker as the templating system for its output, he's eating his own dog food doubly! This seemingly creates a circular dependency between FreeMarker and FreeCC, except that luckily all that was needed was a FreeMarker JAR with parser built still using JavaCC to bootstrap the process initially; the projects are self-sufficient since and don't need JavaCC anymore.)

Wednesday, October 08, 2008

HTML encapsulated JSON

Continuing the early morning post on having an XSL-like solution for JSON, where your webapp only outputs JSON files, and has attached stylesheet(s) that the browsers can use to display it as a nicely formatted HTML it intended for human audience. All solutions I outlined there needed an active change to existing technologies: custom HTTP header, or extension to JSON, but in any case they would need explicit support from browsers. 


In other words, they would never get widely adopted. 

But then, just as I went to sleep, it hit me that there's a solution that operates completely within the currently existing technologies. I'll call this solution "HTML encapsulated JSON". The premise is that you create a simple HTML page that has the JSON payload in its body, and has script elements that pull in the JSON-to-DOM transforming script. Something like this:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script src="/carlist-json-to-html.js" type="text/javascript" />
<link rel="stylesheet" type="text/css" href="carlist.css" />
</head>
<body>
<pre>
{ "cars": [
{ "color": "red",
"make": "Ford",
"model": "Mustang",
"year": 1986,
"price": {
"amount": 250,
"currency": "USD"
}
},
{ "color": "yellow",
"make": "Mazda",
"model": "6",
"type": "GT",
"year": 2002,
"price": {
"amount": 14000,
"currency": "USD"
}
}
],
"dealer": {
"name": "Honest Joe",
"address": {
"street": "123 7th Avenue",
"city": "Dustfield"
},
"phone": "1-456-789789"
}
}
</pre>
</body>
</html>

As you can see, the JSON is put into html/body/pre (using "pre" to be fully DTD conformant). A machine client should be able to easily parse it out from there. Granted, it'll need to use both an XML parser and then a JSON parser, but that shouldn't be too big deal.
But the best part is definitely that you can very easily transform this into a nicely formatted HTML, by including a script (emphasized in red) to build a DOM from the JSON extracted from the body. Actually, you can just generate a fairly naked HTML, and then use a separate CSS stylesheet (as shown above) to format the HTML for presentation! And it's all standards compliant, and works with any modern browser!
Next, I'll try to come up with typical code for a JavaScript code template that would be used for a JSON-to-DOM conversion.

CSS for JSON

The back-in-the-day promise of XSL stylesheets was that you could output an XML document from a HTTP request, and have it processed as-is by a machine recipient, or transformed into a pretty HTML for human viewing automatically. 


Seeing how I (and probably many others) do prefer JSON to XML nowadays for my machine-to-machine communication, I'm thinking of an equivalent for JSON. In similar vein, XSL is almost entirely replaced by CSS today, and I keep thinking that we'd need an XSL or CSS equivalent for JSON.

Actually, we probably don't even need a new language for describing the transform. JavaScript would probably do just fine - read the JSON, build the DOM.

The other problem, assuming we already have the language for transforming JSON to pretty formatted HTML: how would you declare a transform in a JSON document?

In XML, you can use a processing instruction. In HTML, you can use the <link> element. Now, unlike XML's processing instructions, or HTML's <link> element, JSON notoriously lacks any support for any out-of-band information that isn't the actual content. Heck, it doesn't even support comments!

I can see three possible ways around it:
  1. Declare the transform in the transport header, i.e. just slap on an "X-JSON-Stylesheet" or similar header into the HTTP response. The obvious problem is that this ties the solution to the transport.
  2. Add it to a conventional location into the JSON data itself. I.e. have the top-level object literal have a "__metadata__" object, and put this and similar other information into it. i.e.:
    { "__metadata__": { "links": [{"url": "myStyleSheet.cssjs", "type": "stylesheet"}]} ... }
    The obvious problems are: a) the top-level value in a JSON document ain't necessarily an object (can be an array, or even a primitive value) b) some JSON based formats won't be happy about having extra elements with magic names inside of them.
  3. Extend JSON to add a syntactic equivalent of XML processing instructions. Honestly, I prefer this approach. Just adding single-line comment support, and then having a special prefix for the "processing instruction" comment (i.e. using a question mark, to keep it similar with XML PE) would do, I think:
    //?stylesheet myStyleSheet.cssjs
    { real payload goes here ... }
In any case, it is up to browser vendors to get the ball rolling on this - both the declaration mechanism, and the transform language.

I think this would be a great technology to have, where I could just output JSON from my web applications, and have it both consumed by software, and presented in nicely human readable form in browsers.