Wednesday, January 25, 2006

Taking out the pain from C development on Mac OS X

Those of you who know me know me mostly as a Java developer for the last, say, 7 years of my professional life. While that's true - Java work pay my bills - I also occasionally hack C. Not that it doesn't have Java implications; when I hack C, I hack the source code of JamVM, an open source interpeter-only Java VM. Well, I'm not hacking JamVM proper, but instead I have a private fork of it that is extended with detailed execution tracing; it is something I do as part of my academic research on dynamic slicing of Java programs.

Now, going from developing in Java back to developing in C can be, well, intimidating. When I started hacking JamVM, I was still using a Windows machine. Given how JamVM is a project built on UNIX, using standard automake/autoconf approach, I had a bit of a system mismatch there that for certain details unfortunately not even Cygwin could bridge. So I ended up with a Linux running inside a VMWare instance. Basic work was ok, but if ever I got a segmentation fault, I had hard time wrestling with gdb or any of its supposed GUIs, was wasting countless hours tracking down one bad pointer, and was really nostalgic for long gone years of my commercial C development using Microsoft Visual Studio. Visual being the key here.

Well, I'm working on a Mac now. And it's a whole lot different. First of all, the "official" compiler for Mac OS X is gcc. What does this mean? Well, among other things it means that all of Apple developer tools will seamlessly work with software projects built around the standard GNU automake/autoconf setup. That's right. And one of the most priceless tools (even though it is free) is Shark. Shark is an incredibly versatile profiler, and I was in a need of one, especially since my "instrumented" JamVM was running like molasses. At the university, we finally beat it into good enough shape to call it "correct", and following the "first make it work, then make it fast" maxim, it was time to subject it to some scrutiny and see how to make it faster.

Profiling was as easy as building the code the standard way, using make and make install from command line, launching Shark, starting the jamvm VM with a test Java program, and pointing Shark to its process (Shark's icon changes to red when it's collecting samples from a program - "scent of blood"? LOL). When done, it has awesome data analysis features that really let you find the bottlenecks in a blink of an eye - look at the above linked article for sample screenshots. It will even mark source code lines with little exclamation marks that when clicked present little popup bubbles with performance enhancement suggestions that apply to that line. Hell, if you really need it the stuff even allows you to inspect your code on the machine code instruction level and has a built-in help for all PowerPC machine opcodes!

So, Shark has shown that 80% of the runtime is spent in my tracing code, and most of that in writing the trace event stream to a file. With that knowledge, I introduced a 1MB in-memory buffer for the file, measured with Shark again and saw that the tracing overhead dropped to 57%. Better, but still not good enough. Shark discovered that after this improvement most of the time is now spent in lock spinning on a mutex - the trace stream is serial, but the Java programs are multithreaded, so I had to introduce a mutex to guard the writes. After pounding on this for a while, I realized that I could introduce small (16K) per-thread trace buffers, and only flush each of them into the 1MB file buffer when it either fills up, when the thread dies, or when the thread is forced to cross a write memory barrier as requested by theJava Memory Model. The mutex I used was now not acquired whenever generating a trace event, but only whenever the thread buffer had to be flushed to the file buffer. (Nota bene: I'm aware this will cause a reordering of events in the event stream such that executions of multithreaded programs that contain race conditions will be incorrectly analyzed - however, at the moment, I don't care about insufficiently synchronized programs as they're buggy anyway and have bigger problems than not being correctly sliceable).

After building the thread buffer feature, I started getting "Bus Error" when I run the new JamVM. This is the PowerPC equivalent of the dreaded "Segmentation Fault" on x86 that usually took me between two and four hours of meticulous debugging on Linux to track down. A little readup on the Apple Developer Connection site shown me how to configure an "external build system" (read: automake generated) project with an executable in Xcode, and launch it under its debugger. There was a handy "Auto-attach debugger on crash" option I checked. Sure enough, I started the program, and as soon as it crashed WHAM! I was instantly looking at the exact source code location, with exact stack trace and variables, much like what you're used to when debugging Java programs in Eclipse. Remember, I didn't build the program in Xcode - I built it using its standard configure/make/make install routine. Yet, since Xcode itself builds using gcc, it worked seamlessly. I found the reason for the crash in ten seconds, rebuilt and rerun, found another similar crash, fixed that too in a minute, and finally it run flawlessly. It was as easy as debugging a Java program.

After all was fixed, another run with Shark has shown that the tracing overhead was reduced to 13% of the total run time. We started from 80% - hooray! This is actually much better than it looks like - if you consider that the "useful" program time is say, 1 minute, then 80% overhead means that the ratio of useful to overhead is 20:80 and the program will run for 5 minutes (overhead generating 4 minutes of run time). 13% overhead means the ratio is 87:13, so the program will only run for 1 minute and 9 seconds. So it is practically four times faster than it was!

And the moral of the story? I was able to pinpoint performance bottlenecks and bad memory access errors in minimal time instead of wasting hours, thanks to really world-class Apple developer tools. It's really the same polished usability in these development tools that you're used to with end-user Apple applications: it just works.

At the moment, I'm using Eclipse CDT to work on the JamVM source code and honestly, it has excellent autocompletion even with a macro-heavy C source code like JamVM's, but I'm seriously considering trying to do all future source code editing in Xcode, as I'm sure there's some further nice surprises awaiting there.

1 comment:

csab said...

Interesting, that although nothing prevents Unix developers to develop a tool like this, I haven't heard of one yet. However, I had already stopped sotfware development when I changed my home operating system to Linux, so maybe it exists.