Constantly Changing: software

Showing posts with label software. Show all posts

Thursday, September 09, 2010

Running YourKit UI remotely

I got into a somewhat unique situation few weeks ago: I had a JVM heap dump 16GB in size that I had to analyze using a memory profiler. It was obviously hopeless to analyze it on my laptop which only has 4GB of RAM. Fortunately, I was given access to a computer half the world away that had 64GB of RAM. What I did was I moved my YourKit profiler instance onto that machine and tried to run it using X.11 tunneled through SSH.

The good news is: it works.

The bad news is: it needs tweaking to work.

First thing I run into was that the GUI was hideously slow. I would click on a button in the profiler, and feedback would take several minutes. I'm not joking. Several minutes. Googling revealed a Sun Bug Database item "Antialiasing/Compositing is slow on remote display". It suggests that to run GUI Java apps over X.11, one should add

-Dsun.java2d.pmoffscreen=false

as a JVM startup flag. I added it to profiler's yjp.sh launch script, and Holy Moly, it became much better. Granted, it was now rendering screens progressively from top to bottom, but it did it in matter of seconds, not minutes. It now felt like something out of 1999 instead of 1993. Further Googling for various combinations of "X.11" and "slow" and "ssh" turned out sites that suggested it's possible to make x.11 over a bit faster, by telling SSH to use Arcfour or Blowfish ciphers instead of AES, as well as to compress the traffic:

ssh -c arcfour,blowfish-cbc -C -X host

This led to further improvements; the progressive rendering got faster by another factor of two. That's as good as it gets. Now it feels like something out of circa 2001. I can work with that.

Tuesday, March 17, 2009

Relativity of simultaneity

A problem with staying in a profession for long years is that there's less and less things that can truly excite or surprise you as the time passes. That's why I was particularly delighted to experience a moment of sudden enlightenment while listening to Rich Hickey talk about "Persistent Data Structures and Managed References" last thursday at QCon in London. The title of the talk might not sound exciting, but I have to tell you, if this were the only talk I attended, it alone would have been worth attending. I don't say this often. Slides are here, although you're missing out Rich's incredibly good verbal presenting style if you just read them, obviously.

Rich's view (with which I have to agree; in the time I knew Rich, I have yet to hear the first thing from him I'd disagree with, whether it be concurrency, memory models, or ideas for what to do in San Francisco if you only have a single day to explore it); so, Rich's view is that the idea of variables in traditional programming languages is broken, because they are based on a single thread of control at a time. This used to be true in single-threaded programs, and even to a degree for multi-threaded programs running on single-CPU systems.

He argues that we need well defined time models for state changes in our concurrent computational systems. A lack of a time model fails to capture the situation where two observers (threads, CPUs) can observe two events (changes of state) in different order.

Then it suddenly hit me! I learned this stuff in school! Relative timelines, observers... Rich is talking about relativity of simultaneity!

(Which is one of the simpler conseqeunces of Einstein's special theory of relativity.)

Wait a moment. Finding such a surprising parallel analogy between physical world surrounding us and our digital computational systems seems at first unlikely, but thinking more of it, it makes perfect sense. For event timelines to appear differently to different observers, we only need two prerequisites: truly parallel processing, and finite speed of change propagation. Both are true for the real world (quite massively parallel, and the changes propagate at at most the speed of light), and for digital computational systems (parallel already at two CPU cores, with changes occurring in CPU registers and needing time to propagate downstream the hardware memory model to be observable by other CPUs).

One of the wonderful things about Rich is that he's able to express these notions very clearly. It is all really obvious when you think about it, and I couldn't even say I wasn't aware of it for at least three years now, it's just that before hearing this talk, I always had a very slight hope that someday, someone (infinitely wiser than me) will somehow be able to, you know, solve this. Eliminate this problem. The thing I realize now is that it's inherent. You can no sooner eliminate relativity of simultaneity and the rest of consequences it brings to the table from our computational systems than you could cancel its effects in the physical world.

Rich does the only sane thing to do with a problem you can't eliminate - he embraces it. His programming language, Clojure, is an attempt at being the easiest way to write correct concurrent programs. Emphasis on all of "easy to write", "concurrent", and "correct". I learned about Clojure sometime last year when it came up at the jvm-languages mailing list. I read up the documentation on the website, and came away thoroughly impressed. Then I met Rich and saw a 30-minute Clojure presentation last year at the JVM language summit, saw the 60-minute version of the same talk now at QCon, and I'm still impressed. Clojure embraces the problem of concurrent state modifications by encapsulating the state into "references" - the only mutable data types in Clojure, which all point to immutable values. The references can change in time though to point to new values, and the big idea here is that each reference has defined concurrency semantics for how and when change occurs.

Now, there's likely a huge number of temporal semantics of how can some entity's state change. Clojure identifies several usual semantic categories though, and provides them for ease of use. We have synchronous coordinated (Clojure has a STM for that), synchronous uncoordinated, asynchronous uncoordinated, as well as isolated (thread local). I think you can build whatever you need using those.

Anyhow, during Rich's talk I had a moment of enlightenment. Not because I heard something entirely new, but because the pieces of the jigsaw puzzle finally fell into place and revealed a picture. The picture does away with a false hope I had, which is good as it is a clear and definite answer to a question, and having accepted it it is at least clear to me what's the direction forward. During the rest of the day, I was telling pretty much everyone I met and their dog about how I was shown the inherent manifestation of relativity in finite-speed concurrent systems. (Joe Armstrong was just smirking at it, probably thinking "you wet-behind-the-ears kids, this should've been obvious to you for decades", and then we proceeded making fun of all n-phase commit protocols, for any finite value of n).

On multicore systems, there's no longer absolute time. Every core runs on its own relative time and there's no definite sequencing of events. It's time we build this assumption into our code, and not use tools and methodologies that are ignorant of this fact. And as Rich would point it out, writing your code in Clojure is a good way towards the goal.

Friday, January 30, 2009

Speaking at QCon

Just a heads up that I'll be speaking at QCon in London this March, about using JavaScript in the enterprise. Looking forward to meet you all there.

Wednesday, October 22, 2008

FreeCC, the modern JavaCC

If you ever needed to write a parser, you hopefully didn't do it by hand but rather used a parser generator. The two predominant parser generators in Java are ANTLR and JavaCC. JavaCC project however suffers from paralysis for too many years now. The original developers aren't present for too many years now, and the current set of active committers didn't make any significant improvements for years. My FreeMarker colleague Jon Revusky has recently taken interest in it (FreeMarker having used JavaCC in the last six years, when Jon ditched the hand-written parser of the 1.x versions). Since his improvements to the JavaCC codebase weren't accepted by mainstream (largely dormant) JavaCC project, he did the only correct move: he forked it.

Enter FreeCC.

FreeCC is backwards compatible on source level with JavaCC - it can generate parsers from existing JavaCC sources, and the parsers behave identically (since it's a continuation of the original codebase with different name, it's not too surprising, but doing the work under fresh name, it doesn't hurt to emphasize), FreeCC, however has some modern new amenities, such as:

typesafe collections in its own code, instead of Java 1.1 Vector and Hashtable

instead of hardcoded prints in the source code, the parsers are generated using editable templates (can you say "multiple target languages support?")

other long overdue source code sanitization

generated parsers don't rely on static singletons anymore

These are all nice and dandy; code reworks in particular improve the project's long term maintainability by making its comprehension more easy for newly joining developers (in this regard, FreeCC is much more welcoming than JavaCC for a newbie hacker). These however don't really give you much of an advantage when you're writing your grammars. The following ones do, and they're the real kickers:

code injection feature eliminates (or at least strongly reduces) the need for manual post-editing of files

grammar include feature, which allow smaller grammar files to be reused in larger ones (with JavaCC, you had to copy-paste), as well as allow you to organize larger grammars into separate smaller files.

These features are also the sign of what kind of features are still to come: features that provide you with modern conveniences a programmer in a need of a parser generator would wish to have. These two new features above help you create maintainable grammar sources, and they are just the start. If you adopt these features however, there's no going back. Since JavaCC doesn't have these features, your grammar files will no longer be compatible with JavaCC, only with FreeCC. But you'll hardly want to go back. FreeCC has taken this particular parser generator codebase much further in its few months of existence than the JavaCC project did in years, and is gaining traction. Given the fact that FreeCC is a continuation of the current JavaCC codebase (which didn't really progress further since the fork), it is really risk-free to try it out for your next project (or even in current project!) instead of JavaCC. You can also expect that the developer will be more open to your feature requests, as Jon has a good track record of listening to community wishes in FreeMarker.

FreeCC is the JavaCC you'd want to use in 21st century.

At FreeMarker project, we've already switched to FreeCC; FreeMarker 2.4.0 and 2.3.15 will both have a parser built using FreeCC. Since Jon also works on FreeMarker, he's truly eating his own dog food. (Actually, there's even more to that. Since FreeCC in turn uses FreeMarker as the templating system for its output, he's eating his own dog food doubly! This seemingly creates a circular dependency between FreeMarker and FreeCC, except that luckily all that was needed was a FreeMarker JAR with parser built still using JavaCC to bootstrap the process initially; the projects are self-sufficient since and don't need JavaCC anymore.)

Wednesday, October 08, 2008

HTML encapsulated JSON

Continuing the early morning post on having an XSL-like solution for JSON, where your webapp only outputs JSON files, and has attached stylesheet(s) that the browsers can use to display it as a nicely formatted HTML it intended for human audience. All solutions I outlined there needed an active change to existing technologies: custom HTTP header, or extension to JSON, but in any case they would need explicit support from browsers.

In other words, they would never get widely adopted.

But then, just as I went to sleep, it hit me that there's a solution that operates completely within the currently existing technologies. I'll call this solution "HTML encapsulated JSON". The premise is that you create a simple HTML page that has the JSON payload in its body, and has script elements that pull in the JSON-to-DOM transforming script. Something like this:


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <script src="/carlist-json-to-html.js" type="text/javascript" />
    <link rel="stylesheet" type="text/css" href="carlist.css" />
  </head>
  <body>
    <pre> 
        {   "cars": [
                {   "color": "red",
                    "make": "Ford",
                    "model": "Mustang",
                    "year": 1986,
                    "price": { 
                        "amount": 250,
                        "currency": "USD"
                    }
                },
                {   "color": "yellow",
                    "make": "Mazda",
                    "model": "6",
                    "type": "GT",
                    "year": 2002,
                    "price": { 
                        "amount": 14000,
                        "currency": "USD"
                    }
                }
            ],
            "dealer": {
                "name": "Honest Joe",
                "address": {
                    "street": "123 7th Avenue",
                    "city": "Dustfield"
                },
                "phone": "1-456-789789"
            }
        }
      </pre>
    </body>
</html>

As you can see, the JSON is put into html/body/pre (using "pre" to be fully DTD conformant). A machine client should be able to easily parse it out from there. Granted, it'll need to use both an XML parser and then a JSON parser, but that shouldn't be too big deal.

But the best part is definitely that you can very easily transform this into a nicely formatted HTML, by including a script (emphasized in red) to build a DOM from the JSON extracted from the body. Actually, you can just generate a fairly naked HTML, and then use a separate CSS stylesheet (as shown above) to format the HTML for presentation! And it's all standards compliant, and works with any modern browser!

Next, I'll try to come up with typical code for a JavaScript code template that would be used for a JSON-to-DOM conversion.

CSS for JSON

The back-in-the-day promise of XSL stylesheets was that you could output an XML document from a HTTP request, and have it processed as-is by a machine recipient, or transformed into a pretty HTML for human viewing automatically.

Seeing how I (and probably many others) do prefer JSON to XML nowadays for my machine-to-machine communication, I'm thinking of an equivalent for JSON. In similar vein, XSL is almost entirely replaced by CSS today, and I keep thinking that we'd need an XSL or CSS equivalent for JSON.

Actually, we probably don't even need a new language for describing the transform. JavaScript would probably do just fine - read the JSON, build the DOM.

The other problem, assuming we already have the language for transforming JSON to pretty formatted HTML: how would you declare a transform in a JSON document?

In XML, you can use a processing instruction. In HTML, you can use the <link> element. Now, unlike XML's processing instructions, or HTML's <link> element, JSON notoriously lacks any support for any out-of-band information that isn't the actual content. Heck, it doesn't even support comments!

I can see three possible ways around it:

Declare the transform in the transport header, i.e. just slap on an "X-JSON-Stylesheet" or similar header into the HTTP response. The obvious problem is that this ties the solution to the transport.
Add it to a conventional location into the JSON data itself. I.e. have the top-level object literal have a "__metadata__" object, and put this and similar other information into it. i.e.:
```
{ "__metadata__": { "links": [{"url": "myStyleSheet.cssjs", "type": "stylesheet"}]} ... }
```
The obvious problems are: a) the top-level value in a JSON document ain't necessarily an object (can be an array, or even a primitive value) b) some JSON based formats won't be happy about having extra elements with magic names inside of them.
Extend JSON to add a syntactic equivalent of XML processing instructions. Honestly, I prefer this approach. Just adding single-line comment support, and then having a special prefix for the "processing instruction" comment (i.e. using a question mark, to keep it similar with XML PE) would do, I think:
```
//?stylesheet myStyleSheet.cssjs
{ real payload goes here ... }
```

In any case, it is up to browser vendors to get the ball rolling on this - both the declaration mechanism, and the transform language.

I think this would be a great technology to have, where I could just output JSON from my web applications, and have it both consumed by software, and presented in nicely human readable form in browsers.

Tuesday, September 02, 2008

Package private access in Open Source code

I recently got in a same situation three times: someone wanted to use code I wrote in an Open Source project written in Java, and they couldn't, because the class/method in question had package private ("default") access, rendering it inaccessible outside of its package.

First, Charlie Nutter needs access to package-private class BeanMetaobjectProtocol in Dynalang-MOP.
Next, John Arkley needs access to package-private class AllHttpScopesHashModel in FreeMarker to help push it into Spring.
Finally, Steve Yegge needs access to the package-private constructor of Context class in Rhino.

Now, you know, when the same thing hits me three times in a row in short timeframe (incidentally, all three coming from people with high geek cred), that gets me thinking: What good is package-private access in an Open Source project anyway?!

It's a very valid question, really. Think about it: people can see the code - it's open. People want to use the code - it's normal. And when they want to, you frustrate them by declaratively preventing them from linking their code to your open source code. They can look, but they can't touch.

That seems wrong.

Of course, you could argue for package private access' validity. Here are some arguments you could come up with:

Package private helps implementation hiding on a package level! Well, duh, it does. However, if a class/method is useful to another class you wrote and that lives in the same package, it might also be useful to some poor outsider schmuck too! It must already conform to quite rigorous coding standards, as it will be used by some other classes, ones you wrote in the same package, so it'd better maintain state invariants and so on. Does it really make a difference if it's another class of yours, or another class of another developer, living in another package? I say: it shouldn't. If you think the class is just an auxiliary, and it's just cluttering the JavaDoc, just move it to a *.support subpackage instead (Spring does that extensively).

Package private helps you hide bits you don't want to be tied down by backwards compatibility! This is really a corollary of the first one. I used to be big on this one (that's probably why I got asked to loosen up access restrictions in the first place - because I used to place them there in the first place). Common wisdom is that once people start using your publicly available API, you'd better not break it on the next release. By making lots of stuff public, you increase the surface area that needs to be kept backwards compatible, right? Right, sort of.

Now, how about this instead: make it public anyway, just note in a JavaDoc entry (or even better, in an annotation, say @SubjectToChange) that there's no backwards compatibility guarantee on this method. If it's an annotation, people can even have an automated tool for checking for its use before they upgrade. Hell, you can even have a @BreaksCompatibility annotation! What it boils down to is: don't treat users of your library as children and decide what's good for them. Inform them that the API is volatile, but open it up, don't close it. It's not really closed anyway, as they can see the source. You're just erecting a glass wall in front of them; they can look but they can't touch.

And they'll come to bug you about it anyway.

My point of view right now is that it's better to provide a suggestive hint that an API is volatile in the documentation or annotation and let anyone use them at their own risk rather than build a non-negotiable restriction into the code (as in, can't negotiate it with a compiler; you can still definitely negotiate it with me).

Also, I'm not saying either that package private access is completely inadequate for Open Source libraries. I'll admit there might be valid use cases for it, but if there are, they're very few.

Now, before you'd accuse me of being a "make everything public" proponent, let me say that this reasoning doesn't apply at all to private access, and only partially applies to protected access. Let's see why:

Private access is completely justified. Let me point out one clear distinction between packages and classes: classes and instances of classes can have state. Packages can not. That's pretty much what makes the difference. Through refactorings, you can end up with methods in a class that violate its state invariants. Other methods in the class can call these methods as steps to transform the object from one valid state to another valid state, but it might be invalid in the interim. You would never expose such a method publicly. Well, you shouldn't expose it even package-privately either; that'd be sloppy programming (see my above remark about package-privately accessible methods having to be just as rigorously coded as public ones). A class (and class instance) is a unit of state encapsulation, a package is not. Hence, classes need private access.

Protected methods remain usable by 3rd party code, as long as it has classes that extend your base class. It mightn't be too ideal a constraint (forcing subclassing in order to access functionality), and you might rather want to design your libraries to favor composition (has-a) over subclassing (is-a) architecture. But if you must have classes intended to be subclassed, and there's functionality that's only ever used/extended/implemented by a subclass, make it protected. Especially when you have abstract protected methods that act as poor Java programmer's mixins (are intended to be called by code in the base class). Sometimes you'll allow (or downright expect) such abstract methods to violate an invariant in the object's state temporarily, as it'll be called as a step in a more complex state transition implemented by an algorithm in the base class method. In these cases, protected access is justified. But in other cases though, you might still consider making some protected methods public.

In conclusion: I'm currently fairly convinced that what package private access is good for is preventing your users from linking their code to useful bits of your code for purposes you didn't anticipate up front; it's often nothing more than an unnecessary garden wall.

Thursday, July 24, 2008

Undeclared, Undefined, Null in JavaScript

So, a coworker of mine run into a situation recently where they were testing for an expression in JavaScript as:

if(obj.prop != null && obj.prop.subprop > 0) ...

This basically guards you against dereferencing "subprop" if "obj.prop" is itself undefined or null. They were running into a situation where the first predicate would pass, and the second would fail. However, "obj.prop" always printed "undefined". Turns out, it actually had the string "undefined" as its value.

Ouch.

Anyway, said coworker pointed me to a blog entry titled Three common mistakes in JavaScript / EcmaScript. Said blog entry says:

if (SomeObject != null) {

Well, in JavaScript, which is a dynamic language, something that has not been assigned to is not null, it's undefined. Undefined is different from null. Why? Don't ask me. Well, anyways, you can use typeof to explicitly check for undefined, or use other more or less clean tricks, but the best way to deal with that is probably to just rely on the type-sloppiness of JavaScript and count on it to evaluate null and undefined as false in a boolean expression, like this:

if (SomeObject) {

It looks uglier, but it's more robust.

I have to disagree. It's not more robust. It will also catch the cases of SomeObject having numeric value of 0, or string value of empty string. Because their boolean coercion is also false. To make matters worse, the original example, using SomeObject != null actually works and is in most cases actually the most appropriate!

See, in JavaScript, null and undefined are actually equal according to the == and != operator! (ECMA-262 standard, section 11.9.3 tells us so.) In vast majority of cases, you don't care about the difference at all, so using someExpr != null is good enough.

If you really-truly must distinguish between undefined and null, you have some options. Curiously, while there is an actual built-in language literal for the null value (namely, null), the sole value of the Null type, there is no built-in language literal for undefined, the sole value of the Undefined type. The identifier "undefined" can be assigned to, so you can't write something simple as if(x === undefined):

var undefined = "I'm defined now";
var x; // he's really undefined
print(x === undefined); // prints false

Inconvenient, huh? So, how to test for undefined? Well, the common practice found in most books and tutorials on JavaScript seems to be using the JS built-in typeof() function, but I really don't like it, because this is implemented by way of a string comparison:

var x;
print(typeof(x) == "undefined");

will actually print true. But as I said, I think it's ugly.

My solution instead relies on the fact that undefined is equal to null, but is not strictly equal to null, therefore this expression also works:

var x;
print(x == null && x !== null);

will also print true, and it involes only two simple comparisons.

Which brings us to the question of what is the actual difference in JavaScript between an undeclared variable, a variable with undefined value, and a variable with null value. Let's see:

var x = {}; // empty object
var u; // declared, but undefined
var n =  null; // declared, defined to be null

function isUndefined(x) { return x == null && x !== null; }

print(isUndefined(x.x)); // prints true - access to undefined property on an object yields undefined
print(isUndefined(u)); // prints true - declared, but undefined value
print(isUndefined(n)); // prints false - the value is null, not undefined
print(isUndefined(z)); // runtime error -- z is undeclared

So, it's an error to dereference an undeclared variable, "z" in above example (it's okay to assign to it, which creates a new global variable). It is not an error to dereference a declared variable with undefined value, "u" in above example. Its value is the undefined value. Further, access to any undefined properties on objects also result in undefined value, duh. As for null, well, null is just a value like true, or false, or 4.66920166091; it's the single value of the type Null.

Hope this clears up the whole topic of undefined/null values (and undeclared variables) somewhat.

Saturday, June 07, 2008

Objective-J

So, here's some news: people at 280 Slides created a web application that allows you to build presentations in your web browser. It does look very nice, people are comparing it to Apple's Keynote application. All in all, yet another webapp out there; what's the big deal, right?

Well, the people who created 280 Slides were previously Apple employees. 280 Slides wasn't just written in JavaScript. No. These people created something called Objective-J, which is to JavaScript as Objective-C is to C. And then they implemented part of Apple's Cocoa application framework atop of it (named it Cappuccino), and finally implemented the application atop of it.

Now that's quite amazing.

Dion Almaer writes that

Objective-J is the language that takes JavaScript and makes it Objective (as Obj-C did to C). Lots of square brackets. When the browser gets served .j files, it preprocesses them on the fly. This means that you can do things like, use standard JavaScript in places.

Interesting. Objective-J will eventually be open sourced at objective-j.org, and I'll be quite curious to see what did they do. I suspect they have a transformer from Objective-J source code to plain JavaScript (presumably itself written in JS), and then the browser's JS runtime converts the source code to JS when it downloads it. But I might be wrong.

Then there's the interesting issue that Objective-C improved C with OO features. But what did Objective-J improve? JavaScript is extremely object-oriented to begin with, so this sounds more as if they wanted to bring the actual Objective-C flavor of OO to JavaScript instead, because that's what they're comfortable doing. They need to drive nails into a different wall now, and they'd still prefer to do it with their old hammer!

Don't get me wrong, I'm not making fun of them. Shaping one's tools in a new environment after ones you knew and loved in a previous environment is a valid activity if you percieve it as the path that allows you to be most productive. To build a new language atop of JS and then build an application framework atop of it, and then build a very usable and visually appealing application on top of it (very cross-browser compatible too) gets you a metric shitload of geek cred in my circles. It might turn out to be a catalyst for getting a lot of similarly nice future webapps out there. It might turn out to be the next big thing for JavaScript in browser.

I'm eagerly waiting for content to start popping up at objective-j.org, although of course the Objective-J.js can be readily inspected.

Thursday, June 05, 2008

So, WebKit JS used an AST walker before?

Color me flabbergasted.

WebKit recently announced a new JavaScript interpreter, named SquirellFish, claiming it is much faster than the previous interpreter in WebKit.

That's good and all, but in the "Why it's fast" section of the linked article, they say:

Like the interpreters for many scripting languages, WebKit’s previous JavaScript interpreter was a simple syntax tree walker.

It was? Oh my... They also say that:

SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes.

Why, yes, indeed!
For the record, Rhino has been doing this for ages - AST is compiled to internal "JS bytecode" format that strips away grammar, and then interprets it. This works like this since, well, around the turn of the millenium. (Actually, Rhino can even kick it another notch and can also optionally compile the JS bytecode to Java bytecode, eliminating the interpreter altogether. Which bytecode then the JVM JIT compiler can further compile to machine code at its own discretion.)

Anyway, I digress. All I wanted to say is that I'm honestly amazed that a supposedly professional implementation of JavaScript (the one shipped in WebKit before SquirellFish came along, and by consequence, shipped in Apple's Safari browser) would use an AST walking interpreter. Yeah, I know, you'll say "most scripts run once on the page, so optimization is overkill", but with AJAX this is no longer true, and apparently, the WebKit team also thinks so.

On the positive side, they're finally moving on to a better solution, and I congratulate them on the undoubtedly hard decision to finally take the leap toward a more modern execution architecture.

Thursday, May 29, 2008

"Mixed Language Environments" interview

in case you're interested, the video interview where Kresten Krab Thorup interviewed Charlie Nutter, Erik Meijer, and myself about mixed language environments on virtual machines is now online. This was shot at the JAOO conference last September.

Wednesday, April 16, 2008

Interview about Rhino

Floyd Marinescu interviewed me about Rhino at the JAOO conference in Aarhus, Denmark last September. You can watch the interview here. Yes, I know I said "scratch to itch" :-)

I actually stepped down from the Rhino maintainer role since this interview was shot, although I do remain with the project as a committer.

Friday, March 21, 2008

Xstream with Hibernate

People have been asking in the comments to my post on XStream about how to properly integrate Hibernate with XStream, as there are few pieces of puzzle that don't necessarily fit together perfectly. I answered in the comments, but Blogger annoyingly doesn't allow a lot of formatting in comments, so I decided to write a separate follow-up post; here it goes.

First of all, I use Hibernate 2. I know, it's old, but it does the work and I won't fix it unless it is broken. I expect most of the advice will also apply to Hibernate 3.

I'm aliasing all classes using XStream.alias(). Reason being that on the server side, some data model classes are actually subclassed for additional (although non-Hibernate) related functionality. You might or might not need this.

This is however essential: We need to ensure that XStream will treat Hibernate lists, sets, and maps as Java lists, sets, and maps. I believe this is what helps avoid the infamous "com.thoughtworks.xstream.converters.ConversionException: Cannot handle CGLIB enhanced proxies with multiple callbacks..." problem. There are three things that need to be done:

use Xstream.addDefaultImplementation() to tell XStream to treat all Hibernate-enhanced collection classes as plain Java collections:

xstream.addDefaultImplementation(
        net.sf.hibernate.collection.List.class, java.util.List.class);
xstream.addDefaultImplementation(
        net.sf.hibernate.collection.Map.class, java.util.Map.class);
xstream.addDefaultImplementation(
        net.sf.hibernate.collection.Set.class, java.util.Set.class);

Finally, in order for XStream to actually handle these collections I needed to define and register some custom converters that are able to handle Hibernate collections as Java collections:

Mapper mapper = xstream.getMapper();
xstream.registerConverter(new HibernateCollectionConverter(mapper));
xstream.registerConverter(new HibernateMapConverter(mapper));

These custom converter classes are rather trivial, here are their definitions. All they really do is extend the XStream built-in collection and map converters, and declare their ability to handle Hibernate lists, sets, and maps:

import net.sf.hibernate.collection.List;
import net.sf.hibernate.collection.Set;
import com.thoughtworks.xstream.converters.collections.CollectionConverter;
import com.thoughtworks.xstream.mapper.Mapper;

class HibernateCollectionConverter extends CollectionConverter {
    HibernateCollectionConverter(Mapper mapper) {
        super(mapper);
    }

    public boolean canConvert(Class type) {
        return super.canConvert(type) || type == List.class || type == Set.class; 
    }
}

and

import net.sf.hibernate.collection.Map;
import com.thoughtworks.xstream.converters.collections.MapConverter;
import com.thoughtworks.xstream.mapper.Mapper;

class HibernateMapConverter extends MapConverter {

    HibernateMapConverter(Mapper mapper) {
        super(mapper);
    }

    public boolean canConvert(Class type) {
        return super.canConvert(type) || type == Map.class; 
    }
}

That's all I did and it eliminated all of my Hibernate+XStream problems - hope it will also help you.

Thursday, March 20, 2008

Attributes and items

So, here I am trying to further my metaobject protocol library. I'm now in the eat-my-own-dog-food phase of sorts, as - after having written a MOP for POJOs, I'm trying to write actual MOPs for some dynamic language implementations, most notably, Jython and JRuby. And pretty soon, I hit an obvious design problem (which is okay, really - this is still an exercise in exploring the right approach). Namely, lots of languages actually have two namespaces when it comes to accessing data belonging to an object. I'll refer to one of them as "attributes" (as that's what they're called in both Ruby and Python) and the other are "items", which are elements of some container object. All objects have attributes, but only some objects (containers) have items. Some languages don't make a distinction, most notably, JavaScript. In JavaScript, all objects are containers and they only have items (and an item can be a function, in which case it functions as a method on the object). Other languages (Ruby, Python) will distinguish between the two; the containers are arrays/lists and hashes/dictionaries/maps. As a matter of fact, it helps thinking of Java as having the distinction - JavaBeans properties are the attributes, and arrays, Maps, and Lists will have items.

I'd like to think that most people's mental model of objects actually distinguishes the two.

Now, to make matters a bit more complicated, in lots of languages the container API is actually just a syntactic sugar. Give an object a [] and a []= method, and it's a container in Ruby! Give it __getitem__, __setitem__, and few others, and it's a container in Python! Honestly, this is okay - as a byproduct of duck typing, one shouldn't expect to there be any sort of an explicit declaration, right?

For ordered containers, most languages will also make it possible to manipulate subranges as well.

Bottom line is, I feel this is a big deal to solve in interoperable manner, as the raison d'être of this library is to allow interoperability between programs written in different languages within a single JVM; I imagine in most cases the programs will pass complex data structures built out of lists and dictionaries to one another, so it feels... essential to get this right. It also feels like something that can rightfully belong in a generic MOP as most languages do have the concept of ordered sequences and associative arrays. Of course, I might also be wrong here; it is also an essential goal to not end up with a baroque specification that contains everything plus the kitchen sink.

So, here am I wondering whether this is something that can be made sufficiently unified across the languages to the point that if a Ruby program is given a Python dictionary, and it calls []= on it, it actually ends up being translated into a __getitem__ call. The goal seems worthwhile, and is certainly possible but I'm not entirely sure how much of an effort will it take. There's only one way to find out though :-)

Friday, February 22, 2008

Microsoft Open Source

I was surprised to discover today that Microsoft has a hosting site for Open Source projects (analogous to, say, SourceForge.net or GNU Savannah): witness Codeplex. To my further surprise, there's also an Open Source community site at Microsoft, named Port 25. I haven't got time to investigate either of them more deeply yet, but plan to do so in future. This is intriguing.

(Found them both following Microsoft's Open Source Interoperability Initiative FAQ.)

Mind you, I was aware earlier that Microsoft has software released under OSI-approved licenses; there are more than few Microsoft-backed projects hosted on SourceForge.net (or did they move to Codeplex since? Hm...). What I was not aware of, and what I believe is a big deal is that Microsoft is now providing its own hosted infrastructure for Open Source project hosting and community discussion.

Wednesday, February 20, 2008

Ignorance is bliss

"Hi. My name's Attila, and I write shitty code."

The latest Really Bad Practice I managed to implement was making some business-level code aware of its underlying infrastructure. In particular, made them aware of Spring's ApplicationContext and such.

Ignorance is bliss, and this goes for code as well. A protocol handler unaware of transport peculiarities can be reused with any transport. Code that is unaware of memory management will automatically benefit from improvements in garbage collection.

The less your code knows about the context it is used for, the easier to reuse it in different context, but even more importantly, the easier for the context to manage it as it is supposed to do.

With dependency injection (DI) stuff like Spring, making components aware of the existence of it is bad, but it won't necessarily become apparent immediately. But when you want to implement something more involved; say, a graceful shutdown for your service, you'll suddenly no longer be able to have the infrastructure do the work for you. In my particular case, I could no longer rely on the dependency graph maintained by Spring after some of my components directly pulled some other components from the application context.

Of course it was a stupid thing to do. I usually know better.

As an excuse, let me say I only resorted to this in rather ugly situations. There are asynchronous callbacks from external systems, through mechanisms that make binding to the infrastructural context "normally" hard. And there's Java deserialization, the notoriously DI-unfriendly mechanism where you either resort to thread locals or statics (which reminds me of Gilad's new intriguing blog post "Cutting out the static", by the way). (Dependency injection in deserialized objects is something Guice user's guide will also admit being a problem for which the best and still quite unsatisfying solution is static injection.)

So yeah, I have the excuse of committing the faux pas when faced with a tough situation, but still. (Mind you, eradicating all Spring-awareness alone won't solve my problem of graceful service shutdown while it might still be waiting for asynchronous responses from an external system, but would certainly go long way toward it.)

The lesson is however clear; it is often the path of least resistance to reach from your code down to a supporting layer, but it can easily come back to bite you when the said layer was meant to be invisible. You think you might need to expose only a bit of a plumbing, but as time goes on, you realize that if you continue, you'll end up either uncovering the whole goddamned kitchen sink, or having to reimplement some of it. Then it's time to finally notice what you're doing (better late than never, I guess), backtrack, and refactor; bury the infrastructure back to where it belongs, not visible at all from the higher level. It sure does make some fringe cases harder to implement, but the important thing is that it keeps the easy bits easy.

Sunday, January 13, 2008

Sim City goes open source

The source code for the original Sim City (the game responsible for countless hours I spent sitting in front of a computer when I was 16), have been released under GPL v3. If you ever played it (you did, right?), you had to admire all the cross-interaction of your planning decisions and certainly wondered about the underlying mechanics. Well now, you can read it first hand!
One of insightful quotes from the announcement:

The modern challenge for game programming is to deconstruct games like SimCity into reusable components for making other games!

That is a very important point, and illustrates well the unique aspect of exponential utilization opportunities of open source software. (Even if I personally believe more liberal licenses than GPL contribute towards this aspect more).

Tuesday, January 08, 2008

Java 6 on Leopard. Well, almost

So, it looks like there's finally a developer preview of Java 6 for Mac OS X Leopard. Am I happy? Nope. Why? Here's why:

		CPU Architecture
		PowerPC	Intel
CPU bits	32-bit		Machines I have
CPU bits	64-bit	Machines I have	Machines running Java 6

It only runs on Macs with 64-bit Intel CPU. Now, I have a 64-bit PowerPC Mac, and I have a 32-bit Intel Mac, but no 64-bit Intel Mac, so no cake for me. Darn. Hopefully the final release will run on all hardware that Leopard itself can run on.

Thursday, December 27, 2007

Rhino in Spring 1.2

So, I've used some free time around the holidays to put the finishing touches on a new release of Rhino in Spring. The new release is 1.2, and it has one major new feature (as well as few featurettes). The major big thing: it is now fully clusterable with Terracotta!

Actually, that's pretty much it. There's few other featurettes as well, but the real big leap forward was restructuring of the internals so that it plays nicely in Terracotta distributed clusters. Configuration for clustering is dead easy: it comes with a Terracotta "configuration module", a special JAR file hosting an XML file with clustering declarations for the code. You just reference it in your tc-config.xml and that's it. The rest is Terracotta magic :-)

Friday, December 07, 2007

FreeMarker 2.3.11 released

We released FreeMarker 2.3.11 on Wednesday. There are few new feature goodies as well as some significant performance improvements (both speed and memory) in there, not to mention bugfixes, so you might consider upgrading from earlier 2.3.x version. Head over to the project website for more info and download.