Tuesday, March 17, 2009

Relativity of simultaneity

A problem with staying in a profession for long years is that there's less and less things that can truly excite or surprise you as the time passes. That's why I was particularly delighted to experience a moment of sudden enlightenment while listening to Rich Hickey talk about "Persistent Data Structures and Managed References" last thursday at QCon in London. The title of the talk might not sound exciting, but I have to tell you, if this were the only talk I attended, it alone would have been worth attending. I don't say this often. Slides are here, although you're missing out Rich's incredibly good verbal presenting style if you just read them, obviously.

Rich's view (with which I have to agree; in the time I knew Rich, I have yet to hear the first thing from him I'd disagree with, whether it be concurrency, memory models, or ideas for what to do in San Francisco if you only have a single day to explore it); so, Rich's view is that the idea of variables in traditional programming languages is broken, because they are based on a single thread of control at a time. This used to be true in single-threaded programs, and even to a degree for multi-threaded programs running on single-CPU systems.

He argues that we need well defined time models for state changes in our concurrent computational systems. A lack of a time model fails to capture the situation where two observers (threads, CPUs) can observe two events (changes of state) in different order.

Then it suddenly hit me! I learned this stuff in school! Relative timelines, observers... Rich is talking about relativity of simultaneity!

(Which is one of the simpler conseqeunces of Einstein's special theory of relativity.)

Wait a moment. Finding such a surprising parallel analogy between physical world surrounding us and our digital computational systems seems at first unlikely, but thinking more of it, it makes perfect sense. For event timelines to appear differently to different observers, we only need two prerequisites: truly parallel processing, and finite speed of change propagation. Both are true for the real world (quite massively parallel, and the changes propagate at at most the speed of light), and for digital computational systems (parallel already at two CPU cores, with changes occurring in CPU registers and needing time to propagate downstream the hardware memory model to be observable by other CPUs).

One of the wonderful things about Rich is that he's able to express these notions very clearly. It is all really obvious when you think about it, and I couldn't even say I wasn't aware of it for at least three years now, it's just that before hearing this talk, I always had a very slight hope that someday, someone (infinitely wiser than me) will somehow be able to, you know, solve this. Eliminate this problem. The thing I realize now is that it's inherent. You can no sooner eliminate relativity of simultaneity and the rest of consequences it brings to the table from our computational systems than you could cancel its effects in the physical world.

Rich does the only sane thing to do with a problem you can't eliminate - he embraces it. His programming language, Clojure, is an attempt at being the easiest way to write correct concurrent programs. Emphasis on all of "easy to write", "concurrent", and "correct". I learned about Clojure sometime last year when it came up at the jvm-languages mailing list. I read up the documentation on the website, and came away thoroughly impressed. Then I met Rich and saw a 30-minute Clojure presentation last year at the JVM language summit, saw the 60-minute version of the same talk now at QCon, and I'm still impressed. Clojure embraces the problem of concurrent state modifications by encapsulating the state into "references" - the only mutable data types in Clojure, which all point to immutable values. The references can change in time though to point to new values, and the big idea here is that each reference has defined concurrency semantics for how and when change occurs.

Now, there's likely a huge number of temporal semantics of how can some entity's state change. Clojure identifies several usual semantic categories though, and provides them for ease of use. We have synchronous coordinated (Clojure has a STM for that), synchronous uncoordinated, asynchronous uncoordinated, as well as isolated (thread local). I think you can build whatever you need using those.

Anyhow, during Rich's talk I had a moment of enlightenment. Not because I heard something entirely new, but because the pieces of the jigsaw puzzle finally fell into place and revealed a picture. The picture does away with a false hope I had, which is good as it is a clear and definite answer to a question, and having accepted it it is at least clear to me what's the direction forward. During the rest of the day, I was telling pretty much everyone I met and their dog about how I was shown the inherent manifestation of relativity in finite-speed concurrent systems. (Joe Armstrong was just smirking at it, probably thinking "you wet-behind-the-ears kids, this should've been obvious to you for decades", and then we proceeded making fun of all n-phase commit protocols, for any finite value of n).

On multicore systems, there's no longer absolute time. Every core runs on its own relative time and there's no definite sequencing of events. It's time we build this assumption into our code, and not use tools and methodologies that are ignorant of this fact. And as Rich would point it out, writing your code in Clojure is a good way towards the goal.