Monday, January 30, 2006

My brand new pointless obsession

The press is loud with performance comparisons of iMac G5 and iMac Intel systems, generally showing a mixed picture with an average advantage being on Intel's part. I was wondering how much compiler optimizations contribute to the performance differences. I mean, stock PPC binaries for commercial apps and even Mac OS X clearly can't be fully optimized for G5 given that they need to run on G3 and G4 processors as well.

Looking at default XCode settings for "Cocoa Application" project template, it defaults to instruction scheduling optimization for G4, not G5, although you can switch it to G5 (that equals -mtune=970 in GCC, I assume). In XCode, the GCC switch for enabling G5 specific instructions isn't even easily accessible! You need to rewrite the architecture string from the generic "ppc" to "ppc970", and sneak in the -mcpu=970 in the "Other C flags" setting, all manually -- XCode GUI won't assist you in any of this. Then again, probably no commercial software out there utilizes these settings in order for their software to be able to run on a pre-G5 CPU as well.

So, I'm currently obsessed with the following question: is it possible to build "Universal binaries" that carry not only two code versions - generic PPC and Intel, but three of them: generic PCC, PPC970, and Intel. I verified that XCode has no problems building the code if I modify the architecture setting to "ppc ppc970 i386". I could confirm that the generated executable indeed has these three platforms in it. However, the code was rather trivial, and - except for few debug symbols - the generic PPC and the G5 code was identical.

So, my current challenge is to create an executable that hosts PPC and PPC970 architecture code (as a matter of fact, Intel is pretty much irrelevant for this experiment), have the PPC970 code be fully optimized for G5, and have Mac OS X load it instead of generic PPC code on a G5 CPU.

Of course, I have very limited time resources to spend on this newly acquired "hobby" since this has nothing to do with things I do for a living, meaning I get to devote time to it only when I manage to sneak away from family for half hours on evenings. It's however a good vector for getting myself acquainted with generic hacking of this new OS platform. I really need a bit of an intellectual change of air, as I've been too Java focused for the last 7 years.

Wednesday, January 25, 2006

Taking out the pain from C development on Mac OS X

Those of you who know me know me mostly as a Java developer for the last, say, 7 years of my professional life. While that's true - Java work pay my bills - I also occasionally hack C. Not that it doesn't have Java implications; when I hack C, I hack the source code of JamVM, an open source interpeter-only Java VM. Well, I'm not hacking JamVM proper, but instead I have a private fork of it that is extended with detailed execution tracing; it is something I do as part of my academic research on dynamic slicing of Java programs.

Now, going from developing in Java back to developing in C can be, well, intimidating. When I started hacking JamVM, I was still using a Windows machine. Given how JamVM is a project built on UNIX, using standard automake/autoconf approach, I had a bit of a system mismatch there that for certain details unfortunately not even Cygwin could bridge. So I ended up with a Linux running inside a VMWare instance. Basic work was ok, but if ever I got a segmentation fault, I had hard time wrestling with gdb or any of its supposed GUIs, was wasting countless hours tracking down one bad pointer, and was really nostalgic for long gone years of my commercial C development using Microsoft Visual Studio. Visual being the key here.

Well, I'm working on a Mac now. And it's a whole lot different. First of all, the "official" compiler for Mac OS X is gcc. What does this mean? Well, among other things it means that all of Apple developer tools will seamlessly work with software projects built around the standard GNU automake/autoconf setup. That's right. And one of the most priceless tools (even though it is free) is Shark. Shark is an incredibly versatile profiler, and I was in a need of one, especially since my "instrumented" JamVM was running like molasses. At the university, we finally beat it into good enough shape to call it "correct", and following the "first make it work, then make it fast" maxim, it was time to subject it to some scrutiny and see how to make it faster.

Profiling was as easy as building the code the standard way, using make and make install from command line, launching Shark, starting the jamvm VM with a test Java program, and pointing Shark to its process (Shark's icon changes to red when it's collecting samples from a program - "scent of blood"? LOL). When done, it has awesome data analysis features that really let you find the bottlenecks in a blink of an eye - look at the above linked article for sample screenshots. It will even mark source code lines with little exclamation marks that when clicked present little popup bubbles with performance enhancement suggestions that apply to that line. Hell, if you really need it the stuff even allows you to inspect your code on the machine code instruction level and has a built-in help for all PowerPC machine opcodes!

So, Shark has shown that 80% of the runtime is spent in my tracing code, and most of that in writing the trace event stream to a file. With that knowledge, I introduced a 1MB in-memory buffer for the file, measured with Shark again and saw that the tracing overhead dropped to 57%. Better, but still not good enough. Shark discovered that after this improvement most of the time is now spent in lock spinning on a mutex - the trace stream is serial, but the Java programs are multithreaded, so I had to introduce a mutex to guard the writes. After pounding on this for a while, I realized that I could introduce small (16K) per-thread trace buffers, and only flush each of them into the 1MB file buffer when it either fills up, when the thread dies, or when the thread is forced to cross a write memory barrier as requested by theJava Memory Model. The mutex I used was now not acquired whenever generating a trace event, but only whenever the thread buffer had to be flushed to the file buffer. (Nota bene: I'm aware this will cause a reordering of events in the event stream such that executions of multithreaded programs that contain race conditions will be incorrectly analyzed - however, at the moment, I don't care about insufficiently synchronized programs as they're buggy anyway and have bigger problems than not being correctly sliceable).

After building the thread buffer feature, I started getting "Bus Error" when I run the new JamVM. This is the PowerPC equivalent of the dreaded "Segmentation Fault" on x86 that usually took me between two and four hours of meticulous debugging on Linux to track down. A little readup on the Apple Developer Connection site shown me how to configure an "external build system" (read: automake generated) project with an executable in Xcode, and launch it under its debugger. There was a handy "Auto-attach debugger on crash" option I checked. Sure enough, I started the program, and as soon as it crashed WHAM! I was instantly looking at the exact source code location, with exact stack trace and variables, much like what you're used to when debugging Java programs in Eclipse. Remember, I didn't build the program in Xcode - I built it using its standard configure/make/make install routine. Yet, since Xcode itself builds using gcc, it worked seamlessly. I found the reason for the crash in ten seconds, rebuilt and rerun, found another similar crash, fixed that too in a minute, and finally it run flawlessly. It was as easy as debugging a Java program.

After all was fixed, another run with Shark has shown that the tracing overhead was reduced to 13% of the total run time. We started from 80% - hooray! This is actually much better than it looks like - if you consider that the "useful" program time is say, 1 minute, then 80% overhead means that the ratio of useful to overhead is 20:80 and the program will run for 5 minutes (overhead generating 4 minutes of run time). 13% overhead means the ratio is 87:13, so the program will only run for 1 minute and 9 seconds. So it is practically four times faster than it was!

And the moral of the story? I was able to pinpoint performance bottlenecks and bad memory access errors in minimal time instead of wasting hours, thanks to really world-class Apple developer tools. It's really the same polished usability in these development tools that you're used to with end-user Apple applications: it just works.

At the moment, I'm using Eclipse CDT to work on the JamVM source code and honestly, it has excellent autocompletion even with a macro-heavy C source code like JamVM's, but I'm seriously considering trying to do all future source code editing in Xcode, as I'm sure there's some further nice surprises awaiting there.

Friday, January 20, 2006

Public domain podcast

Public domain podcast. In its own words:

Great works read out loud in a weekly podcast. Authors include Mark Twain, Jules Verne, Edgar Allen Poe and many others. The website also includes links to public domain resources & topics of interest to literary and audiobook fans.

Just the thing I need to occupy myself while working out. Usually I just listen to the radio, but am always thinking about how I could use the time better if I were listening to something of more value than the mix of chitchat and popular music that's typically aired on local stations.

Too bad there's not a similar service for technical books. I.e. right now I could honestly have better use for an audiobook version of Code Complete 2 that I'm unable to finish reading for, like, two months now :-)

Thursday, January 19, 2006

Where privacy is tradition

In light of US Justice of Deparment demanding user search records from Google, Boing Boing quoted this excerpt from John Batelle's "The Search" that I'll repeat here:

As we move our data to the servers at,,, and, we are making an implicit bargain, one that the public at large is either entirely content with, or, more likely, one that most have not taken much to heart.

That bargain is this: we trust you to not do evil things with our information. We trust that you will keep it secure, free from unlawful government or private search and seizure, and under our control at all times. We understand that you might use our data in aggregate to provide us better and more useful services, but we trust that you will not identify individuals personally through our data, nor use our personal data in a manner that would violate our own sense of privacy and freedom.

That’s a pretty large helping of trust we’re asking companies to ladle onto their corporate plate. And I’m not sure either we or they are entirely sure what to do with the implications of such a transfer. Just thinking about these implications makes a reasonable person’s head hurt.

If U.S. government gets any more invasive, I'd suggest Google move to where privacy is tradition - Switzerland. Just leave behind a small marketing department to do business with US advertisers.

Black art of backing up a Mac

I depend way too much on my computer to not have it backed up frequently. So, I bought a FireWire drive for my new iMac three weeks ago. I just vaguely specified a "250 GB FireWire drive" when I ordered over the phone from my local Apple dealer, so I should not have been surprised when they sold me a M9-DX :-). The stuff is designed for a Mac Mini - has exactly the same dimensions and is originally intended to be placed underneath one, so now I have something that resembles a Mac Mini attached to the back of my iMac. It unfortunately ruins the "where is the computer?" fun with people who see an iMac for the first time in their life, as now they assume that the external hard drive must be the computer...

Anyway, now that I have the hardware to backup to, I need software. Well, this proved to be the hard part. I was accustomed to the built-in backup app in Windows and was a bit perplexed that there's no built-in solution on a Mac. I tried both Carbon Copy Clone and Psync (lazy to dig out the links, you can Google for them if you want). They both did their job, but have one serious shortcoming: they won't deal with FileVault. FileVault is the Mac OS X feature for encrypting the home directory - it creates a disk image encrypted with a 128-bit AES key and mounts it onto your home directory. Unless you create an identical setup fiddling in command line with hdiutil on your backup volume first, you'll expose your protected data unencrypted in the backup. Not good.

However, I got concerned with efficiency. You see, the formatted capacity of both the iMac's drive and the backup drive is 232 GB. I currently use 49GB. A much more efficient way to utilize all that vast space on the backup drive would be to be able to do incremental backups, but at the same time retain diffs and thus the ability to restore any previously backed up state. It turns out this wet dream is not a dream, but a reality - and it is called rdiff-backup. This cutie does exactly this - can back up a complete volume or select directories, and keeps a separate directory inside the backups that holds reverse diffs for previous versions. It uses the highly efficient binary diff algorithm used by rsync, but in case the backup drive ever fills up with diffs, old ones can be purged specifying the cutoff criteria in few different ways (number of diffs to keep, maximum age of diffs to keep, etc.). It's just insanely great. It is a generic open source UNIX utility, so it works for all you Linux people out there as well. Actually, it works under Windows too. The easiest way to get it on Mac is to install it through the Fink GNU distro.

There are caveats, though. Mac OS filesystem supports extended attributes, and while rdiff-backup will handle them if you find a suitable xattr library for it, you need to find one yourself separately. The "original" xattr on SourceForge is not Mac OS X compliant. Fortunately, there's a Mac OS X version - you can get it from You'll have to drop it into the correct Python distro (rdiff-backup is written in Python) to have it picked up by rdiff-backup. In case you installed rdiff-backup through Fink, the correct location is Fink's Python 2.4 libraries, namely /sw/lib/python2.4/site-packages/xattr.

Finally, if you use FileVault, you again need to prepare a FileVault equivalent on your backup volume. I only did a solution that works for a single user (me) who's logged in during the backup. Basically, my backup script will mount the encrypted disk image on the backup volume, backup into it, then unmount it. Of course, I could have just backed up the encrypted image file from one volume to the other, however I think the rsync diff algorithm wouldn't be too efficient on such a high-entropy content. So, it looks like this:

mkdir /Volumes/Backup/Users/aszegedi

hdiutil attach /Volumes/Backup/Users/.aszegedi/aszegedi.sparseimage \
-owners on -nobrowse \
-mountpoint /Volumes/Backup/Users/aszegedi

rdiff-backup /Users/aszegedi /Volumes/Backup/Users/aszegedi

hdiutil detach /Volumes/Backup/Users/aszegedi

Of course, it is also important to NOT backup the original /Users/.aszegedi/aszegedi.sparseimage!

Aside from these finer points, the rest of the script much resembles the one presented on Carbon Copy Clone site, except rdiff-backup being used in place of ditto, the /Users directory is not backed up blindly to avoid backing up the FileVault disk image. There is however, similar to CCC a separate invocation for every backed up directory, recreation of root symlinks in the backup volume following the backup, as well as a bless to make the backup bootable.

Took me "just" two evenings to sort it all out :-)

Wednesday, January 18, 2006

Remembered for the light

This is an insanely hard topic to write about.

I didn't know Rebeka any better than anyone who only got a limited insight into her last days from the evening news and the papers. A four year old little girl, with a mysterious disease that attacked her liver, her doctors and her family struggled for her life for days. She was the first to receive a liver transplant from a live donor - her father. For few days there was hope, then the new organ also stopped functioning, attacked by the same disease that ruined her own liver. Days went by waiting for a new donor while her condition grew worse. Then finally they found a new donor. The news reported that the new, second transplant seems to be working okay after the operation. And then one day later her much troubled tiny body gave up and she died from a sudden pneumonial hemorrhage.

I can't imagine what her family went through in the last days. Her father gave up part of his liver. They probably went from hope to dispair and back again one too many times. It must have been a living hell. It probably still is.

They said in the evening news yesterday that she died a few minutes before 6 PM. I'm definitely not superstitious, but I recalled that exactly five minutes before 6 PM, I was in public bath (there was a clock on the wall, that's why I recall the time) with my son - I was watching for him while he was having fun on the waterslide after his swimming lesson, and suddenly I was overcome with the feeling that something is wrong with my own four-year old daughter; I had a vision of her badly injured, on life support in a hospital. I was trying to convince myself that she's fine, she's with my sister who happens to be a doctor and who earlier spent a year being a full-time au-pair. The vision was so strong though that it was very hard to fight the urge to go from the pool back to the wardrobe and call my sister on the phone immediately. I managed for another five minutes, then told Ákos we're going home. As I said, I'm not superstitious, I believe it is a coincidence, but an eerie coincidence nonetheless.

Children die in this world. It is one of facts of universe that is hardest to come to terms with. Actually, I don't think I can come to terms with it. Almost a year ago Álmos, a little boy, a kindergarten playmate of my son, died. He had a heart condition since his birth and went through several heart surgeries during the few years of his life. His parents were completely devoted to ensuring a normal life to their son. They were with him on every surgery, they were going to special gymnastics and swimming to make his heart stronger. They put in enormous effort into it. There was to be one last surgery that would finally allow him to grow up into a normal young man, with next surgery not due until he is in his twenties. He died during that surgery - an air bubble accidentally got released into his cardiovascular system, and burst a vein inside his brain. I heard that the surgeon admitted responsibility. I remember how my own heart sank when I saw the black flag on the kindergarten's facade. I also remember my own helpless rage when I first thought how a young soul in perfect mental health was denied the wonders of childhood, of growing up, of taking part in fullest in the wonder of existence, just because he was as unlucky as to be bound to a body with a bad heart and whose surgeon made one bad move.

We met the parents one day after the funeral in the cemetery - we were unable to attend the funeral, so we went to the cemetery one day later. My son wanted to say goodbye to his friend as well - he is a highly sensible and intelligent little kid and he learned to accept that death of our beloved ones is part of our lives when we lost my wife's mother nearly three years ago. Anyway, we were already leaving the cemetery when we came across the parents who were then coming to cemetery. I was unable to say anything comforting to them. I never before faced people grieving over the loss of their child. I didn't feel uttering any of the typical phrases was appropriate. I felt that whatever I said would sound cheap, and cheapening this tragedy was out of the question. Finally I told them the only thing that I could say honestly: "I would love to be able to say something wise and comforting to you. Alas, I can not. I can not think of any words that would be appropriate for your tragedy."

And I still can not.

May you always be remembered for the light you brought into the lives of your beloved ones while you were with them.

Friday, January 13, 2006

Protected Species

Yesterday evening we had a babysitter, so Kriszti and me went out to see a movie. The mall where the Cinema City is located also has one of city's biggest bookstores that also keeps CDs. I remembered that I decided to look for the new Gorillaz album, "Demon Days" as I heard few tracks already, I read positive critical acclaim, so all-in-all it looked like it's worth risking the money on it. At first, I don't find it on the display racks, but notice there's the new album from Black Eyed Peas, "Monkey Business". Having heard two tracks from it on the radio already and liking them, I decide I might buy it if I don't find what I came for originally. The CD also lacks any indication of being DRM-crippled, so all is well. A little while later, I find the Gorillaz CD as well.

There's a problem, though. It's inscribed with "This CD contains copy control mechanism" (in plain English "we crippled this CD as a futile attempt to prevent it from being copied is much more important to us than your convenience, dear customer") Uh-oh. I know I import all my CDs into iTunes and listen to them there, so with that in mind, let's inspect it a little bit closer. The information on the jewel case claims that the DRM is compatible with Mac OS X. Double uh-oh. If it were ignorant of Mac OS X, then I'd only need to overcome my sense of ethics to buy it, but since it claims to be actually aware of Mac OS X, it could pose some real danger to my system and damn if I ignore my sense of security - I depend on that computer for earning my living.

I walk over to the cashier and ask the lady:

- "This CD has copy protection. If I have problems playing it, will you take it back?"

She tries to dismiss my concerns: - "Oh, that only means you can not copy it to a blank disc!"

Yeah, sure, but I have other concerns: - "But it is also printed on the back that it may not play correctly in some players. If it won't play correctly in mine, will you take it back?"

She's again a bit naive: - "It plays fine everywhere, except maybe in some computers!"

Well, that's where I want to play it, so I'm pressing the point: - "Will you take it back if I can't play it?"

She finally gives me a very good advice: - "Sir, I suggest you'd better not buy it if you have concerns."

I remember my first encounter with a crippled CD - it was Chris Rea's "Stony Road" back few years ago. I spotted it at my boss' desk one day and borrowed it, but then struggled with playing it through the work PC's CD-ROM drive. It made the machine lock up. It made the CD-ROM drive emit all kinds of scary noises. It was a definite hazard to the machine and my work environment.

With that memory recalled, I decided I'll listen to cashier lady's advice and there'll be no Gorillaz for me. I don't buy shoddy merchandise. None of my money to them. No matter how much effort went into the production of the content and how brilliant it turned out, if it's accompanied by a copy protection measure that radiates the message "dear consumer, we assume you might be a thief", it reduces its appeal to me to zero. I'm listening to my new Black Eyed Peas album right now, though. Ironically enoughm, when I told the story to a youngster friend of mine today, he instantly offered to look up the Gorillaz album in MP3 on his college's campus network :-). I declined though. I don't need that music so bad that I'd be willing to either obtain it illegally, or succumb to a DRM scheme (whichever is worse). Every single song I have in my iTunes at the moment is imported from a CD I bought, and I will keep it this way.

Moral of the story: they lost a customer due to a CD-crippling scheme that does nothing to protect their content, since it seems to be very easily obtainable illegally. I remember about a year ago Faithless also lost a sale as I didn't buy "No Roots" because it, too, was released as a crippled CD. I wrote to the band, and got a response from their manager expressing how they are not pleased with DRM either, but the publisher forces it on them. Too bad guys.

I'll keep buying discs conforming to CD Digital Audio standard. I'll keep not buying shiny plastic discs that contain random bits of data that might or might not be playable, and that might attack my computer systems if coming into physical contact with them. At the moment, there's still enough good music available on non-crippled CDs out there. I hope it stays this way.

Tuesday, January 10, 2006

Serving a different purpose

As a break from the everyday life of a software developer, here's an excellent, intelligently written weblog of a waiter. A little bit bent on aspects of serving and ordering wine, maybe (as if there's anything wrong with that!), you'll find great short stories about the guests too.

Update: This is just the funniest sentence I read in the last few days in one of the stories: (on 30th December) "Of course the mall is packed with people converting unwanted gifts into iPods." ROTFL. Reminds me that I'm still feeling a bit guilty that I moved late this Christmas, so I couldn't find my wife an iPod Nano anywhere in the city when I tried shopping for it. Maybe I could try to "convert" her replacement present now (deadline for gift replacement is January 15 here)

Saturday, January 07, 2006

A language development to keep an eye on

.Net Language Integrated Query. I won't summarize it further than the previous link, but do yourself a favor and read it if you ever had to mix SQL strings into your code. This language innovation is big - a strongly typed (instead of SQL-string) based query facility for arbitrary data models (not just relational). If you want, you can apply filtering and projection to an in-memory array of objects no differently than you'd apply it to a relational table. What's really stunning however is all the other language innovation underpinning it - C# 3.0 evolved features like instance initializers, lambda functions, extension methods, and anonymous types, to support LINQ, and the absolutely super-stunning aspect of it all is that they succeeded in implementing almost all of them as syntactic sugar managed completely by the compiler. (No-sugar versions of these constructs still being available to programmer when needed). I get the feeling C# is seriously leaving Java behind in language innovation. Oh yeah, one of its primary architects is Don Box (but don't assume I'm judging it by authority).

Old computers nostalgia

For all you people old enough to feel nostalgic about ten-something and twenty-something old hardware, here's a nice online museum I just discovered. I personally owned a Commodore 64C and an Atari 520 STm, but I remember drooling over the specs of the NeXT Cube read in a computer magazine article when I was 8th grader in elementary school (read under the bench during a class, of course - I lived in a small village where everyone knew everybody, so the mailman often delivered my magazines to me directly when he delivered the school's mail. I don't know if he deliberately timed to arrive for a recess, but he was pretty good at it, delivering me a strongly tempting distraction from lectures once a month.) I also spent some considerable ammount of time programming the Apple IIc+ machines in the computer lab of my secondary school.

Anyway, here's a little mental game: assign a probability for each of the below listed contemporary machines to be remembered in this or similar computer museum twenty years from now:

  • Generic whitebox Pentium-4 PCs

  • Branded Pentium-4 PCs (Dell, HP, etc.)

  • IBM ThinkPads

  • Apple iMac G5

Update: Another site documenting vintage hardware that is worth visiting is here. It seems to have a smaller selection of hardware, however they are described in more detail and with more pictures. Incidentally, Boing Boing yesterday published a post related to a scanned Atari ST magazine from 1988.

Monday, January 02, 2006

SCP/SFTP GUI client for Mac

Fugu is an excellent replacement for WinSCP on Mac.


Yesterday I was confronted with my kid asking whether all snowflakes have 6 sides. Turns out the answer is yes, or more precisely "A snowflake always has six lines of symmetry, which arises from the hexagonal crystal structure of ordinary ice" (quote from the linked page).

What is however more amazing are the photos of snowflakes on the NOAA website, an archive of the work of Wilson Bentley, whose hobby was photographing snowflakes, and who developed the method and apparatus for photographing them before they melted, and made more than 5000 pictures of them at the end of 19th and beginning of the 20th century.


Moving to Blogger

I decided to switch from JRoller to Blogger (I'm not alone in this regard), as I found JRoller to have several shortcomings, like lack of email notification on comments. Also, not the least of concerns is that JRoller is meant to be a Java-themed blog site, so it always felt wrong to blog about nontechnical issues over there. This at least is no longer a problem :-)

The most daunting task of setting up a new blog is coming up with the title and - in case of Blogger - a subdomain in the * domain for URL disambiguation. After figuring out a title (and fighting the urge to just name it "foo", "This is not a title", "We don't need no steenkeen' titles") I started looking for some nice name for the subdomain part of the URL. The nicest URL for a blog named "Constantly Changing" on Blogger would of course be, however it is unfortunately already taken by a stillborn blog that only lived to produce two test entries two years ago. Next attempt was, but it's also currently being used by a four-year old intelectual debris. Next is, which hosts two blog entries, in the more recent of them we can learn that the owner of the blog was "still in a state of shock over the terrorist attacks" 13 days after 9/11, however on the flip side "At least the Dolphins won yesterday". Well, I can't even start to describe how pathetic it seems to me that someone can draw any parallel between events that are this many orders of magnitude different in importance. I really didn't expect to unearth such gems while looking for a nice short URL. Next on we have the obligatory test entry, followed by the entry about how "Blogger sucks". Apparently it sucked enough for the owner to never look back at its poor orphaned blog ever since. Finally, is yet another single-entry blog that allows us to peek into a day of a guy who woke up to his alarm clock, walked his dog, and returned home with milk and that day's papers. I'm not sure which of these events were so out of ordinary in his everyday life that he felt he must share it with the world, but who am I to judge :-)

So I've settled for "constc". To me, it also associates with speed of light, so being a generic science geek, I like it for this even more.