Wednesday, June 14, 2006

Accomodating for everyday parallel computing

A bit more than a year ago, Herb Sutter published "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" in Doctor Dobb's Journal (which is incidentally the only printed computer magazine I buy). If you don't know this article yet, go and read it, I'll wait right here.

In case you're hasty and decided to skip it in spite of better advice, here's the summary: the trend over the years was that you could afford yourself to keep writing ever less efficient code and have it be compensated by increase in hardware processing power. That trend is over. The reason: CPUs today aren't made faster by increasing their linear processing speed anymore. CPU manufacturers are encountering big technical difficulties on that route lately. Rather, the CPU processing power is increased by adding multiple cores. The bad news is that your sequential, single-threaded algorithm won't automatically benefit from this, as it did from clock speed bumps in the past years.

So, what's to do? Ignore the problem is one solution, but computers with multicore CPUs are today quite widely available on market. I'm typing this on an Intel Core Duo system myself. Running a test program with an infinite loop won't max the CPU utilization on this gear. It'll push it up to 50%. I need to run another instance of the program to push the CPU all the way to 100%. If you ignore the system, you'll produce software that can only utilize 50% of the CPU resources. People will inevitably find it slow over time and use your competitor's software who chose to not ignore the problem.

Another possibility is manual parallelization. Identify hotspots in your code. Rewrite them to be multithreaded. Use mutexes, locks, the whole shebang of multithreaded programming. If you have an array of 1 million elements (a big sound sample, maybe, or a picture), chunk it up and feed each chunk to a different thread. Even better, than chunking it into 2 equal parts on a 2 CPU system, code a producer-consumer pattern to chunk it into many small pieces and feed to threads adaptively. Of course, your code increases in complexity considerably. Of course, your program might have a runtime overhead for spawning new threads. And then there's the fact that concurrent programming on "low level" - using threads and locks explicitly is hard. It is easy to get it wrong and create race conditions and deadlocks at runtime. So, is this solution ideal? Far from it.

An ideal solution would be a programming paradigm that does yield readable source code, and still allows the compiler or the runtime system to identify paralellizable operations and paralell them, either statically (compiler) or dynamically (a runtime JIT compiler after it decides that the up-front cost of setting the paralellization is less than the gain from paralellization of an operation).

Just as today we have runtime environments with implicit memory management and languages designed for writing programs that run in such an environment, we could soon have environments that have implicit parallelization.

As a typical example, a transformation applied to all elements of a collection independently is a good candicate - provided your programming language lets you express it in such a way. I believe that functional languages are better prepared for this kind of implicit parallelization. There are already some academic-level efforts underway, witness Parallel Haskell.

One very interesting notion some people are vocal about is that stack-based computational models inherently stem from the sequential approach to programming, and that as we strive to embrace programming approaches that naturally lend themselves to paralellization, we'll gradually embrace computation models that aren't stack based. Like the just mentioned functional programming where you don't really express your program in terms of subroutine calls. Or how Excel 12 will also feature parallel computations. BTW, the previously linked blog entry also contains links to some interesting research going on in this general area of single-machine parallel computing.

Is it a say hello again to state machines? Maybe, maybe not. One thing I can more readily imagine is that a today's widespread architectural model for enterprise systems - asynchronous messaging - will somehow get adapted for single-machine single-process development that is meant to run on multiple CPUs.

1 comment:

Joe Duffy said...

Great post Attila, and I agree with many of your points. I hope you enjoy my talk at JAOO -- I am going to discuss why our libraries need to become better at accomodating, if not introducing, parallelism over time. I look forward to meeting you there.

-- joe