Tuesday, May 30, 2006

More blogosphere echoes on continuations in JVM

Following people also reacted to Gilad Bracha's "no continuations in JVM" post (in alphabetical order):

Don Box

Tim Bray

Avi Bryant (developer of Seaside, continuations-based webapp framework in Smalltalk)

Miguel de Icaza (with a link to Ian Griffith's entry about continuations being considered harmful

David Megginson

All these posts revolve around whether webapps are a justifiable reason for bringing continuations in JVM. Now, I can actually agree they are not. However, there's precious little discussion out there about other reasons that are in fact justifiable reasons. At least on JVM, writing handlers for non-blocking socket I/O is one very practical usage. Writing distributed systems with execution location transparency and runtime migratability is other. Yet another is writing cooperative-threading systems where some domain-specific guarantee of scheduling fairness is explicitly encoded into the yielding policy of microthreads implemented by continuations. This would allow implementing i.e. a MMORPG server in Java, similar to how EVE online server-side is implemented in stackless Python, and according to its developers, they currently manage 26000 concurrent users on a 150 CPU cluster. It'd be a nice new server software market for JVM as well.

Wednesday, May 24, 2006

Gilad Bracha: No continuations in JVM

Seems like Gilad Bracha doesn't want to see continuations implemented in JVM. Too bad. His reasoning is that the major use for continuations would be web application flows, and that web applications increasingly tend toward stateless models, and only a minority of functionalities need multipage stateful flows.

Well, let's even allow for the moment that he's right about webflows. However, even if we supposed that, there are still lots and lots of valid uses cases for continuations in JVM. Here are few examples:

  • Distributed agents, where execution hops from one machine to another, because it's cheaper to bring processor to data than the other way round. As a special case, grid computing.

  • Implementing processes with massive parallelism (lots of work units being processed in parallel) but also with some long-blocking points. Like when you have batches of 100 000 work units in-flight, but they're frequently blocked on something, i.e. waiting for user input or better yet an external process communicating with the user gathering complex input, or just waiting for another processing window in case you're bound to specific time windows instead of operating 24/7. You just can't have 100 000 physical threads. No, you use 500 threads and send those that block as a serialized continuation to a database and keep the threads busy with those that aren't blocked. At moment, such systems can be implemented in JVM by coding i.e. in Rhino - Rhino is a JavaScript interpreter in Java that supports continuations. It is however quite unfortunate as at best you end up mixing Java and JavaScript, and the boundaries between those languages in your system are determined by whether a control flow can lead to suspension of execution via a continuation. If it can, then that control flow path - all the "for" and "if" blocks enclosing it must be coded in JavaScript, if not, it can be written in Java. As you see, this delineation between implementation languages in your system stems from a purely implementation-specific constraint, and is not something that naturally follows from the architecture of your system, resulting in suboptimal architectural design (and frustration in the architect because such a limitation is imposed on him). If Java supported continuations, the full system could be written in Java, with no need to reach out to JavaScript.

  • Protocol handlers for NIO-based servers. Think it's a coincidence we don't have full-fledged HTTP NIO servers in Java? Think again. Even handling the basic HTTP/1.1 handshake - with support for 100-Continue protocol and parsing of the headers is nontrivial to do if you are forced to code it as a state machine, trust me.

  • Cooperative threads. They're sometimes needed. I.e. for implementing a MMORPG where you need to be able to guarantee fairness in scheduling. Lots of MMORPGs use stackless python for this purpose. They could use Java, if only Java haad continuations.

There's one more strong reason why Sun should not eschew the idea of continuations in JVM: continuations are already happening in the .NET space. Not in the official Microsoft's implementation, but in Mono - witness Mono Continuations, bringing full continuation support to C#. I don't think Microsoft will not take this idea from Mono and implement it in mainstream .NET. As with proper generics implementation, or I could also mention LINQ, the .NET platform will gain yet another innovative advantage over Java, making Java look more and more "so 20th century" in comparison. That's something I'd be worried about if I were Sun's chief Java evangelist, Gilad.

Sunday, May 21, 2006

Slow Bob in the Lower Dimensions

Just discovered this: Slow Bob in the Lower Dimensions is a short psychedelic animation by Henry Selick, actually a pilot for an unfortunately never realized series (apparently targetted to be shown onn MTV). At the moment, Henry is directing (and has co-written its script with Neil Gaiman) the movie adaptation of Gaiman's short children's novel, "Coraline".

Saturday, May 20, 2006

Ok, so which one?

Problem at hand: Apple's line of Intel CPU equipped laptops is now complete, so you'd want to purchase one. Okay, but which one exactly?

The middle MacBook model and the middle MacBook Pro models look to me to have the best bang-for-buck value.

The only problem with MacBook is that I'd essentially have to start with throwing out the 2x256MB RAM modules and install 2x512MB third-party modules in them, or alternatively ordering it with at least 1GB BTO (whichever is cheaper). I'm running my current iMac G5 with 1.5GB RAM, and I definitely need at least 1 GB. So, that's already upping the price. With a MacBook Pro, I could keep the 1x512 MB and stick another 1GB in the second slot for 1.5GB (yeah, I know that using identical modules allows you double RAM transfer rates, but believe me, 1.5GB saves you a lot in transfer rates to and from a swapfile in return).

Also, would probably want to upgrade the HDD to at least 100 GB in the MacBook. The only real difference then between the MacBook and the Pro would be in the display size and the graphics chipset. Since I'm not really doing lot of 3D gaming, the graphics chipset is not much of an issue and a MacBook with 1GB RAM (80MB gone to the graphics chipset) and a HDD expansion would probably suffice me. The display size is not an issue as at home, I'll be connecting it to an external monitor anyway.

So, all in all, with 2x512MB RAM and 100GB HDD, a MacBook costs $1600, and the MacBook Pro costs $2200. For $600 extra, you get a bigger screen, a better graphics chipset, one ExpressCard slot. However, the advertised battery life is 1.5 hour shorter for MacBook Pro.

Well, looks to me that MacBook is winning on my pro/contra sheet, especially after applying a $600 saving to the "pro" side. I also just read the Ars Technica review of the MacBook, and they basically conclude the same. They even disclose that the HDD is very easily replaceable, so maybe it's worth not buying an upgraded HDD from Apple, but rather buy a beefy 120GB 7200RPM drive on my own instead.

Friday, May 19, 2006

Recognizing The Way Of The Continuation

Seems like lots of people come to the recognition that for really scalable long-running (and/or many-at-once-running) scenarios, you indeed need to build your system on continuations. Well, except if you want to build it on explicit state machines, like some less fortunate projects do, that is.

My just launched Rhino in Spring effort has some similarities with i.e. BPMScript, a project I discovered accidentally today - it's a Business Process Management solution that also uses Rhino with continuations to implement scalable long running processes in a maintainable way (that is, using a high-level programming language to express algorithms instead of explicitly managing a state machine). A prime example of a state machine-ish BPM is the one JBoss develops. Last year when I attended the JAOO conference, a guy from JBoss gave a presentation on JBoss's BPM and tried to convince us how their "graph oriented programming" (muhaha) is in fact much better for the purpose than object oriented programming, as object oriented programming "doesn't support suspend/resume of running processes". I almost fell out of my chair when I heard it. I tried whacking him with the cluebat of continuations after his presentations, but am not sure to this day that I got through. He apparently thought that people who design business workflows like drawing circles and boxes and connecting them with arrows more than writing proper programs in a proper programming language. Well, while there might be truth in that as well, I'm regardless glad to see more and more projects - like BPMScript - step on the more enlightened Way Of The Continuation :-)

The explicit dealing with a state machine is why it looks to me that it wouldn't make much sense integrating Rhino in Spring (RiS) with Spring Web Flow (SWF), by the way. Based on my current survey of SWF code, it's geared toward state machine execution, and can't nicely accomodate a totally different execution paradigm, which is a shame as there'd be some reusable bits if it weren't engineered with state machine approach throughout, assuming that the graph of flow states and transitions can be made readily available up front. It can when you write your program by coding a state machine directly. However as I pointed out in this comment on TheServerSide, enumerating all states of a JavaScript program is a futile thing, since it's a Turing-complete language and thus full state enumeration would be equal to the halting problem. This also goes to show that with a versatile modern programming language, you can build much more complex flows in a natural manner than by piecing together states and transitions manually. I.e. you can bundle data-validation loops, authentication subroutines (i.e. a set of page that logs in the user or allows him to complete a several pages long sign-up process before returning to the task at hand), etc. into functions, bundle those functions into libraries that are then included from main flowscripts.

I think it's an incredibly great thing that thanks to Rhino and continuations, more and more Java-based systems can be built without having to make a tradeoff between the comfort of a modern programming language and runtime scalability. We can have both. As usual, Smalltalk community knows this for decades. Via Rhino, it's breaking into JavaScript and Java as well finally.

Monday, May 15, 2006

Rhino in Spring

So, I've started a new open source project. Not really started it now, as it's been sitting in various states of incompleteness on my machine since last August, always waiting for the next chunk of time I could spend on it. Well, I'm pleased to announce it's ready now. So, what's it about?

The short story is, I integrated Rhino with Spring.

The longer story is, I implemented a custom controller for the Spring's web application MVC framework that allows you to express in JavaScript control flows that span several HTTP request-response cycles (commonly referred to as "webflows".

Below is the text of the announcement as I posted it on TheServerSide (no link, as it didn't show up yet):

A new Apache-licensed open-source project, Rhino in Spring aims to integrate the Mozilla Foundation's Rhino JavaScript interpreter for Java with the Spring Framework.

The current release includes a controller component for the Spring Web MVC that allows you to express complex multipage flows in your web applications as server-side JavaScript programs.

You can use all the amenities of a full-blown imperative programming language while designing flows. You can write libraries of reusable code encapsulated into functions (i.e. validators), you can use the familiar for(), if(), while(), switch/case etc. statements to express the control flow, and so on.

Rhino in Spring uses the Rhino's support for continuations to achieve high scalability - when a script is suspended between a response and the next request in the flow, its state is stored in a continuation (think of it as a snapshot of its stack), taking up no scarce system resources (i.e. no physical threads), allowing for any number of concurrently active flows.

Even more so, "old" states are preserved (with configurable expiration policies), so the users can go back and forward using the browser's back and forward buttons, or even split the flow in two using the browser's "New Window" menu, and the framework will take care of resuming the server-side script on each request originating from a backed or split response page at the correct point, with correct values of all variables automatically - no need to disable the back button or use custom navigation links on your pages to keep server and browser state in sync.

In addition to in-memory and JDBC server-side storage of states it even provides a facility for embedding an encoded textual representation of the continuation in the generated webpage, thus moving it to the client and completely eliminating any server-side state storage for the ultimate in scalability. Compression, encryption and digital signing can be enabled to protect the client-side stored continuations from tampering. As an added bonus, you also get generic Spring bean factories for Java Cryptography Architecture public and private keys as well as Java Cryptography Extension secret keys, that you can also reuse elsewhere.