Constantly Changing

Tuesday, May 30, 2006

More blogosphere echoes on continuations in JVM

Following people also reacted to Gilad Bracha's "no continuations in JVM" post (in alphabetical order):

Don Box

Tim Bray

Avi Bryant (developer of Seaside, continuations-based webapp framework in Smalltalk)

Miguel de Icaza (with a link to Ian Griffith's entry about continuations being considered harmful

David Megginson

All these posts revolve around whether webapps are a justifiable reason for bringing continuations in JVM. Now, I can actually agree they are not. However, there's precious little discussion out there about other reasons that are in fact justifiable reasons. At least on JVM, writing handlers for non-blocking socket I/O is one very practical usage. Writing distributed systems with execution location transparency and runtime migratability is other. Yet another is writing cooperative-threading systems where some domain-specific guarantee of scheduling fairness is explicitly encoded into the yielding policy of microthreads implemented by continuations. This would allow implementing i.e. a MMORPG server in Java, similar to how EVE online server-side is implemented in stackless Python, and according to its developers, they currently manage 26000 concurrent users on a 150 CPU cluster. It'd be a nice new server software market for JVM as well.

Wednesday, May 24, 2006

Gilad Bracha: No continuations in JVM

Seems like Gilad Bracha doesn't want to see continuations implemented in JVM. Too bad. His reasoning is that the major use for continuations would be web application flows, and that web applications increasingly tend toward stateless models, and only a minority of functionalities need multipage stateful flows.

Well, let's even allow for the moment that he's right about webflows. However, even if we supposed that, there are still lots and lots of valid uses cases for continuations in JVM. Here are few examples:

Distributed agents, where execution hops from one machine to another, because it's cheaper to bring processor to data than the other way round. As a special case, grid computing.

Implementing processes with massive parallelism (lots of work units being processed in parallel) but also with some long-blocking points. Like when you have batches of 100 000 work units in-flight, but they're frequently blocked on something, i.e. waiting for user input or better yet an external process communicating with the user gathering complex input, or just waiting for another processing window in case you're bound to specific time windows instead of operating 24/7. You just can't have 100 000 physical threads. No, you use 500 threads and send those that block as a serialized continuation to a database and keep the threads busy with those that aren't blocked. At moment, such systems can be implemented in JVM by coding i.e. in Rhino - Rhino is a JavaScript interpreter in Java that supports continuations. It is however quite unfortunate as at best you end up mixing Java and JavaScript, and the boundaries between those languages in your system are determined by whether a control flow can lead to suspension of execution via a continuation. If it can, then that control flow path - all the "for" and "if" blocks enclosing it must be coded in JavaScript, if not, it can be written in Java. As you see, this delineation between implementation languages in your system stems from a purely implementation-specific constraint, and is not something that naturally follows from the architecture of your system, resulting in suboptimal architectural design (and frustration in the architect because such a limitation is imposed on him). If Java supported continuations, the full system could be written in Java, with no need to reach out to JavaScript.

Protocol handlers for NIO-based servers. Think it's a coincidence we don't have full-fledged HTTP NIO servers in Java? Think again. Even handling the basic HTTP/1.1 handshake - with support for 100-Continue protocol and parsing of the headers is nontrivial to do if you are forced to code it as a state machine, trust me.

Cooperative threads. They're sometimes needed. I.e. for implementing a MMORPG where you need to be able to guarantee fairness in scheduling. Lots of MMORPGs use stackless python for this purpose. They could use Java, if only Java haad continuations.

There's one more strong reason why Sun should not eschew the idea of continuations in JVM: continuations are already happening in the .NET space. Not in the official Microsoft's implementation, but in Mono - witness Mono Continuations, bringing full continuation support to C#. I don't think Microsoft will not take this idea from Mono and implement it in mainstream .NET. As with proper generics implementation, or I could also mention LINQ, the .NET platform will gain yet another innovative advantage over Java, making Java look more and more "so 20th century" in comparison. That's something I'd be worried about if I were Sun's chief Java evangelist, Gilad.

Sunday, May 21, 2006

Slow Bob in the Lower Dimensions

Just discovered this: Slow Bob in the Lower Dimensions is a short psychedelic animation by Henry Selick, actually a pilot for an unfortunately never realized series (apparently targetted to be shown onn MTV). At the moment, Henry is directing (and has co-written its script with Neil Gaiman) the movie adaptation of Gaiman's short children's novel, "Coraline".

Saturday, May 20, 2006

Ok, so which one?

Problem at hand: Apple's line of Intel CPU equipped laptops is now complete, so you'd want to purchase one. Okay, but which one exactly?

The middle MacBook model and the middle MacBook Pro models look to me to have the best bang-for-buck value.

The only problem with MacBook is that I'd essentially have to start with throwing out the 2x256MB RAM modules and install 2x512MB third-party modules in them, or alternatively ordering it with at least 1GB BTO (whichever is cheaper). I'm running my current iMac G5 with 1.5GB RAM, and I definitely need at least 1 GB. So, that's already upping the price. With a MacBook Pro, I could keep the 1x512 MB and stick another 1GB in the second slot for 1.5GB (yeah, I know that using identical modules allows you double RAM transfer rates, but believe me, 1.5GB saves you a lot in transfer rates to and from a swapfile in return).

Also, would probably want to upgrade the HDD to at least 100 GB in the MacBook. The only real difference then between the MacBook and the Pro would be in the display size and the graphics chipset. Since I'm not really doing lot of 3D gaming, the graphics chipset is not much of an issue and a MacBook with 1GB RAM (80MB gone to the graphics chipset) and a HDD expansion would probably suffice me. The display size is not an issue as at home, I'll be connecting it to an external monitor anyway.

So, all in all, with 2x512MB RAM and 100GB HDD, a MacBook costs $1600, and the MacBook Pro costs $2200. For $600 extra, you get a bigger screen, a better graphics chipset, one ExpressCard slot. However, the advertised battery life is 1.5 hour shorter for MacBook Pro.

Well, looks to me that MacBook is winning on my pro/contra sheet, especially after applying a $600 saving to the "pro" side. I also just read the Ars Technica review of the MacBook, and they basically conclude the same. They even disclose that the HDD is very easily replaceable, so maybe it's worth not buying an upgraded HDD from Apple, but rather buy a beefy 120GB 7200RPM drive on my own instead.

Friday, May 19, 2006

Recognizing The Way Of The Continuation

Seems like lots of people come to the recognition that for really scalable long-running (and/or many-at-once-running) scenarios, you indeed need to build your system on continuations. Well, except if you want to build it on explicit state machines, like some less fortunate projects do, that is.

My just launched Rhino in Spring effort has some similarities with i.e. BPMScript, a project I discovered accidentally today - it's a Business Process Management solution that also uses Rhino with continuations to implement scalable long running processes in a maintainable way (that is, using a high-level programming language to express algorithms instead of explicitly managing a state machine). A prime example of a state machine-ish BPM is the one JBoss develops. Last year when I attended the JAOO conference, a guy from JBoss gave a presentation on JBoss's BPM and tried to convince us how their "graph oriented programming" (muhaha) is in fact much better for the purpose than object oriented programming, as object oriented programming "doesn't support suspend/resume of running processes". I almost fell out of my chair when I heard it. I tried whacking him with the cluebat of continuations after his presentations, but am not sure to this day that I got through. He apparently thought that people who design business workflows like drawing circles and boxes and connecting them with arrows more than writing proper programs in a proper programming language. Well, while there might be truth in that as well, I'm regardless glad to see more and more projects - like BPMScript - step on the more enlightened Way Of The Continuation :-)

The explicit dealing with a state machine is why it looks to me that it wouldn't make much sense integrating Rhino in Spring (RiS) with Spring Web Flow (SWF), by the way. Based on my current survey of SWF code, it's geared toward state machine execution, and can't nicely accomodate a totally different execution paradigm, which is a shame as there'd be some reusable bits if it weren't engineered with state machine approach throughout, assuming that the graph of flow states and transitions can be made readily available up front. It can when you write your program by coding a state machine directly. However as I pointed out in this comment on TheServerSide, enumerating all states of a JavaScript program is a futile thing, since it's a Turing-complete language and thus full state enumeration would be equal to the halting problem. This also goes to show that with a versatile modern programming language, you can build much more complex flows in a natural manner than by piecing together states and transitions manually. I.e. you can bundle data-validation loops, authentication subroutines (i.e. a set of page that logs in the user or allows him to complete a several pages long sign-up process before returning to the task at hand), etc. into functions, bundle those functions into libraries that are then included from main flowscripts.

I think it's an incredibly great thing that thanks to Rhino and continuations, more and more Java-based systems can be built without having to make a tradeoff between the comfort of a modern programming language and runtime scalability. We can have both. As usual, Smalltalk community knows this for decades. Via Rhino, it's breaking into JavaScript and Java as well finally.

Monday, May 15, 2006

Rhino in Spring

So, I've started a new open source project. Not really started it now, as it's been sitting in various states of incompleteness on my machine since last August, always waiting for the next chunk of time I could spend on it. Well, I'm pleased to announce it's ready now. So, what's it about?

The short story is, I integrated Rhino with Spring.

The longer story is, I implemented a custom controller for the Spring's web application MVC framework that allows you to express in JavaScript control flows that span several HTTP request-response cycles (commonly referred to as "webflows".

Below is the text of the announcement as I posted it on TheServerSide (no link, as it didn't show up yet):

A new Apache-licensed open-source project, Rhino in Spring aims to integrate the Mozilla Foundation's Rhino JavaScript interpreter for Java with the Spring Framework.

The current release includes a controller component for the Spring Web MVC that allows you to express complex multipage flows in your web applications as server-side JavaScript programs.

You can use all the amenities of a full-blown imperative programming language while designing flows. You can write libraries of reusable code encapsulated into functions (i.e. validators), you can use the familiar for(), if(), while(), switch/case etc. statements to express the control flow, and so on.

Rhino in Spring uses the Rhino's support for continuations to achieve high scalability - when a script is suspended between a response and the next request in the flow, its state is stored in a continuation (think of it as a snapshot of its stack), taking up no scarce system resources (i.e. no physical threads), allowing for any number of concurrently active flows.

Even more so, "old" states are preserved (with configurable expiration policies), so the users can go back and forward using the browser's back and forward buttons, or even split the flow in two using the browser's "New Window" menu, and the framework will take care of resuming the server-side script on each request originating from a backed or split response page at the correct point, with correct values of all variables automatically - no need to disable the back button or use custom navigation links on your pages to keep server and browser state in sync.

In addition to in-memory and JDBC server-side storage of states it even provides a facility for embedding an encoded textual representation of the continuation in the generated webpage, thus moving it to the client and completely eliminating any server-side state storage for the ultimate in scalability. Compression, encryption and digital signing can be enabled to protect the client-side stored continuations from tampering. As an added bonus, you also get generic Spring bean factories for Java Cryptography Architecture public and private keys as well as Java Cryptography Extension secret keys, that you can also reuse elsewhere.

Monday, April 10, 2006

Josh Bloch on API design

Came across slides of a Joshua Bloch talk titled "How to Design a Good API and Why it Matters" he gave in 2005. I'll just say it's worth reading through them.

Wednesday, April 05, 2006

April fool's day come late

Sounds like they're April 1 jokes, but they're not:

Apple officially supports dual-booting Windows and Mac OS on Intel-based Macs.

Microsoft offers its Virtual Server software free, with official support for running Red Hat Linux on it.

My head is spinning... (disclaimer: I can perfectly understand the market motivations for both moves. It's just that I didn't consider them very likely...)

Thursday, March 30, 2006

In UK next week

I'll be in Reading, UK between April 3 and 7. If anyone feels like meeting me over a beer in Reading or London area, drop me a note so we may be able to arrange something.

Tuesday, March 28, 2006

Bleep is about Copehagen interpretation

So, two days ago Kriszti and me watched "What the Bleep Do We Know!?™". Based on reviews, I had some good expectations about it, and while I must say my opinion is still quite fuzzy, the prevailing feeling is that of utter disappointment.

The movie does convey positive messages, promotes the benefits of positive thinking, saying that what you think affects who you are and what your destiny will be etc. In this regard, I can completely agree with it.

Then there's the big but.

First, the movie is shot as a half-documentary, half-fiction, densely interspersed. It aims to gain scientific credibility to its message (I'll talk about the actual message in a bit) by having lots of experts (as well as few "experts") telling their opinion in the documentary half, supposedly reflecting on the happenings in the fiction part. However, the things these people say often feel out of context and they create more confusion than they explain. For a movie supposedly wanting to promote a message and back it up with scientific credentials, the editing is done very poorly (assuming the raw interview material was not as bad in the first place). The Wikipedia page for the movie covers much of the controversy, including one of the interviewed scientists objecting that they edited his interview so that it looks like he supports the movie's claims where he really does not, as well as factual errors, displaying scientifically unproved experiments as facts, and lots of jumping to conclusions.

So, what's the movie about? Well, the movie bases its message on the Copenhagen interpretation of quantum mechanics. As you might know (and if you don't, go and read the link) in that interpretation, observing a quantum phenomenon causes the nondeterministic and irreversible collapse of the wavefunction. This interpretation's problem is that it requires an "observer", thereby introducing consciousness into the theory. The movie goes on to argue that this way, our consciousness actively affects the reality that surrounds us, by observing it, hence it jumps to the conclusion that we create the reality. A big negative point for the movie, in my opinion, is that unless you already studied quantum mechanics, you probably won't understand it. The interviewed persons say "quantum mechanics this" and "quantum mechanics that" all the time, but the explanation of the uncertainity principle is constrained to the scene in the basketball court, and while I was watching it I thought that if I didn't knew all of this already, I'd probably be no less left in the dark after this movie.

Anyway, I must disagree with the movie's conclusion about our consciousness creating the world around us, and us being indistinguishable from God. These views are very old, by the way. You can go back at least to Baruch Spinoza for the philosophical theory of unity of the nature (humans included) and God. You can refer to either George Berkeley or David Hume for the philosophy of subjective idealism. Nothing new here. Supporting these ideas with the Copenhagen interpretation seems to me a bit stretched.

Moreover, and this is my basic cause for disagreeing with the movie, is that there is a different quantum mechanics interpretation, the Many-worlds interpretation (MWI) that completely eliminates the need for any sort of observer for collapsing the wavefunction, as in this interpretation, the wavefunction never collapses. I won't go into explaining MWI here, I'll again direct you to the link above.

Rather, I'll tell you what does MWI mean to me. MWI, if you subscribe to it, does have one very interesting implication. Namely, that all possibilities realize themselves at the same time. On high level, whenever you are in a decision situation, reagardless of how you decide, all outcomes will realize themselves in the probabilistic space. As consciousness is widely regarded (not proven, though) as being a completely classical (in the "classical physics" sense, that is, not quantum mechanics level) phenomenon, the linear stream of events you experience as your consciousness is one path through the global wavefunction of the universe. Whenever there is a decision, the path forks, and you experience one of the paths, while multiple "you"s that share your identity up to that point will experience the other paths.

What does it mean in practice? It means that when you're maybe hesitating on something, like talking to that attractive girl sitting alone in the bar, or telling your coworker that he's being obnoxious about something, or dare to learn parachuting, etc., you need to realize that you will. And also that you won't. Both. At the same time. With different probabilities though. You only get to experience one of these paths, and there's no going back and retracing your steps once you did. You can consciously choose the outcome that'd otherwise be lower probability, leaving the higher probability but duller options to another you. (Although balancing bravery and foolishness is a good idea generally :-))

Sounds wild, and some will argue that assuming such constant forking is in violation with the Occam's Razor principle as it creates a continuum of parallel universes. Proponents will argue that there is no such thing, there is only one universe, represented with a single probabilistic wavefunction, particles exploring all paths through it, and the consciousness you're experiencing being one particular path of particles making up your physical self at the moment. There's also no information flow sideways or backwards that's a fond plot device of fiction works involving time travel and/or parallel universes. Proponents will say that the simpler formal expression of this interpretation actually makes it much more in line with Occam's Razor than the Copenhagen interpretation. (Indeed, MWI operates with less assumptions, is expressible with more elegance on mathematical level and doesn't need the concept of observer).

Also, it doesn't clear you of any personal responsibility, as free will is still completely realizable within this framework - remember, consciousness is a classic physics phenomenon, and regardless of the low-level mechanics and the fact that what you experience as yourself might be taking one path through the wavefunction, while others forking selves are experiencing all the other paths, it still makes you responsible for the acts on your path.

Whether I personally subscribe to MWI? Well, you see, it's hard to decide. I do. I don't. Both, at the same time :-) It's just a theory, and many regard it as unfalsifiable, which shuns it into the domain of belief rather than science. Sometimes, when things go bad, I can find comfort in thinking that at the same time, if this theory holds, then things also didn't go bad, and that some of my probabilistic parallel selves are having it better at the moment.

Tuesday, March 21, 2006

Check out Restlets

For all the fans of the REST approach out there (who also happen to code in Java), seems like someone is working to create a replacement for Servlet API that is explicitly designed for writing HTTP systems the REST way: check out Restlets. Haven't had a chance to look into it deeply, but it's definitely on my list of things to inspect more closely. The folks wisely created (similarly to Servlet spec) a separate spec and a separate reference implementation - this is quite important for widespread adoption, as it should allow things like alternate implementations, i.e. one built on top of servlet API (for leveraging already tried infrastructures of servlet containers out there).

That said, the reference implementation they ship is however a completely standalone server - real men handle their port 80 directly :-). Oh, the RI also uses FreeMarker as a default view technology, being a replacement for JSP :-). Well, I guess that makes me ever slighlty biased.

Wasting time on debugging memory errors again

I'm again losing days of work on debugging an OutOfMemoryError in a production system. The tricky part is that the code implements a very thin wrapper over a database, bulk processes messages, and is totally stateless. The software stack is JVM, then JDBC driver, then Hibernate, then Spring, then my code. There's no memory leak, I could confirm this much with a profiler - whatever was causing the trouble was allocating temp objects held by references on stack, so when the OutOfMemoryError unwound the stack, the smoking gun was gone...

Finally, I turned to JDK 6.0. It's in beta at the moment, but it has a very useful feature: a command line switch "-XX:+HeapDumpOnOutOfMemoryError" that'll cause a full heap dump (in HPROF heap dump format) whenever an OutOfMemoryError is thrown. After having the ops guys install the JDK 6.0 on the machine, I restarted the software under it, with the abovementioned switch, sit back, and waited for a memory error with a grin. And waited. And waited some more. Finally, waited for more than two hours while the system was running on full load. Nothing.

To my fullest and utter surprise, the memory error doesn't manifest itself when running under JDK 6.0, even after few hours of fully stressed operation. Damn. Isn't it typical? Maybe we have again hit a JDK-specific memory bug that got fixed in this later JDK? Unfortunately, I really cannot seriously propose to colleagues to run our production systems on a beta JDK...

Anyway, "-XX:+HeapDumpOnOutOfMemoryError" sounds like something that should have been part of the Java long, long ago. Big enterprise systems run into memory problems. That's a fact. There's few tasks as frustrating as trying to isolate them as the problem inherently manifests itself nonlocally. To have the JVM dump a heap snapshot at that point is invaluable. Don't having this feature caused me one sleepless night too many by now. I heard YourKit will have (or already has?) the ability to analyse HPROF snapshots, which would be really dandy for excavating in the results. Failing that, I still can use the HAT profiler, hopefully they have incorporated my patches to it in the past one year :-)

Sunday, February 19, 2006

Dilbert: Land of unrealistic business assumptions

Scott Adams constantly proves how his work on the Dilbert comic strip should be required reading for businessmen and managers, but I think this most recent storyline beats everything I read so far. The first piece contains a brilliant "strange loop" reasoning from Dogbert (as well as a nod to Chronicles of Narnia, but that's sort of beside the point), and the following pieces (two so far) are just cruelly on point. Whenever anyone tries to sell you a failsafe business plan, make sure they read this first :-)

Friday, February 17, 2006

Political alignment

For what's it worth....

You are a
Social Liberal
(80% permissive)

and an...
Economic Liberal
(36% permissive)

You are best described as a:

Democrat

You exhibit a very well-developed sense of Right and Wrong and believe in economic fairness. loc: (112, -50)
modscore: (22, 48)
raw: (2646)

Link: The Politics Test on Ok Cupid
Also: The OkCupid Dating Persona Test

Thursday, February 16, 2006

Wore out a Mighty Mouse in 3.5 months

My iMac was shipped with a Mighty Mouse back in November last year. I'm sorry to report that I wore out its little scroll ball in only three and a half months of use - it no longer gives a clicky sound when I'm rolling it down, and accordingly doesn't detect the roll. (Scrolling up, left, and right still function normally, only the most heavily used scroll down doesn't.) At $49 retail price this is no cheap mouse, so I kinda expected it to sustain more wear... Just phoned the Apple dealership and they promised the service guys will look at it...

Wednesday, February 15, 2006

FreeMarker Blog

There's now a FreeMarker Blog for all those people who want to keep an eye on the events related to the FreeMarker project - it's a "groupblog" to which yours truly as well as other active FreeMarker developers will be contributing.

Tuesday, February 14, 2006

Spurious wakeup of Java threads

Vlad Roubtsov today posted on the ADVANCED-JAVA list a message saying how he noticed that the JDK 1.5 documentation for java.lang.Object wait() method now contains this bit:

"A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it by testing for the condition that should have caused the thread to be awakened, and continuing to wait if the condition is not satisfied. In other words, waits should always occur in loops, like this one:
synchronized (obj) {
     while ()
         obj.wait(timeout);
     ... // Perform action appropriate to condition
}
(For more information on this topic, see Section 3.2.3 in Doug Lea's "Concurrent Programming in Java (Second Edition)" (Addison-Wesley, 2000), or Item 50 in Joshua Bloch's "Effective Java Programming Language Guide" (Addison-Wesley, 2001)."

Now, I always used while() instead of if() because even if I didn't know about this possibility for "spurious wakeups", I was always a bit paranoid about the reliability of any execution environment my code could be run in. Nevertheless, it is now a documented best practice :-)

Monday, February 06, 2006

Magyar Crok

Found these on my kitchen table this morning, they presumably belong to my 6-year old son. You need to be a Hungarian to understand the cultural shock. (Click on image to see unobscured version on Flickr)

Thursday, February 02, 2006

Progressive Boink

Were you living under a rock and don't know Calvin and Hobbes yet then "25 Great Calvin & Hobbes Strips" can serve as a great introduction as it comes with commentaries. If you are a Calvin and Hobbes fan, it is still worth checking out because some of the commentaries are outstanding.

Needless to say, if you're a serious fan though, so to say a "fan-atic" of Bill Waterson's work like I am, you of course already own a copy of "The Complete Calvin and Hobbes", a 3-volume, 11 kg beauty and keep it on a central place on your bookshelf :-)

Monday, January 30, 2006

My brand new pointless obsession

The press is loud with performance comparisons of iMac G5 and iMac Intel systems, generally showing a mixed picture with an average advantage being on Intel's part. I was wondering how much compiler optimizations contribute to the performance differences. I mean, stock PPC binaries for commercial apps and even Mac OS X clearly can't be fully optimized for G5 given that they need to run on G3 and G4 processors as well.

Looking at default XCode settings for "Cocoa Application" project template, it defaults to instruction scheduling optimization for G4, not G5, although you can switch it to G5 (that equals -mtune=970 in GCC, I assume). In XCode, the GCC switch for enabling G5 specific instructions isn't even easily accessible! You need to rewrite the architecture string from the generic "ppc" to "ppc970", and sneak in the -mcpu=970 in the "Other C flags" setting, all manually -- XCode GUI won't assist you in any of this. Then again, probably no commercial software out there utilizes these settings in order for their software to be able to run on a pre-G5 CPU as well.

So, I'm currently obsessed with the following question: is it possible to build "Universal binaries" that carry not only two code versions - generic PPC and Intel, but three of them: generic PCC, PPC970, and Intel. I verified that XCode has no problems building the code if I modify the architecture setting to "ppc ppc970 i386". I could confirm that the generated executable indeed has these three platforms in it. However, the code was rather trivial, and - except for few debug symbols - the generic PPC and the G5 code was identical.

So, my current challenge is to create an executable that hosts PPC and PPC970 architecture code (as a matter of fact, Intel is pretty much irrelevant for this experiment), have the PPC970 code be fully optimized for G5, and have Mac OS X load it instead of generic PPC code on a G5 CPU.

Of course, I have very limited time resources to spend on this newly acquired "hobby" since this has nothing to do with things I do for a living, meaning I get to devote time to it only when I manage to sneak away from family for half hours on evenings. It's however a good vector for getting myself acquainted with generic hacking of this new OS platform. I really need a bit of an intellectual change of air, as I've been too Java focused for the last 7 years.