Thursday, January 19, 2006

Black art of backing up a Mac

I depend way too much on my computer to not have it backed up frequently. So, I bought a FireWire drive for my new iMac three weeks ago. I just vaguely specified a "250 GB FireWire drive" when I ordered over the phone from my local Apple dealer, so I should not have been surprised when they sold me a M9-DX :-). The stuff is designed for a Mac Mini - has exactly the same dimensions and is originally intended to be placed underneath one, so now I have something that resembles a Mac Mini attached to the back of my iMac. It unfortunately ruins the "where is the computer?" fun with people who see an iMac for the first time in their life, as now they assume that the external hard drive must be the computer...

Anyway, now that I have the hardware to backup to, I need software. Well, this proved to be the hard part. I was accustomed to the built-in backup app in Windows and was a bit perplexed that there's no built-in solution on a Mac. I tried both Carbon Copy Clone and Psync (lazy to dig out the links, you can Google for them if you want). They both did their job, but have one serious shortcoming: they won't deal with FileVault. FileVault is the Mac OS X feature for encrypting the home directory - it creates a disk image encrypted with a 128-bit AES key and mounts it onto your home directory. Unless you create an identical setup fiddling in command line with hdiutil on your backup volume first, you'll expose your protected data unencrypted in the backup. Not good.

However, I got concerned with efficiency. You see, the formatted capacity of both the iMac's drive and the backup drive is 232 GB. I currently use 49GB. A much more efficient way to utilize all that vast space on the backup drive would be to be able to do incremental backups, but at the same time retain diffs and thus the ability to restore any previously backed up state. It turns out this wet dream is not a dream, but a reality - and it is called rdiff-backup. This cutie does exactly this - can back up a complete volume or select directories, and keeps a separate directory inside the backups that holds reverse diffs for previous versions. It uses the highly efficient binary diff algorithm used by rsync, but in case the backup drive ever fills up with diffs, old ones can be purged specifying the cutoff criteria in few different ways (number of diffs to keep, maximum age of diffs to keep, etc.). It's just insanely great. It is a generic open source UNIX utility, so it works for all you Linux people out there as well. Actually, it works under Windows too. The easiest way to get it on Mac is to install it through the Fink GNU distro.

There are caveats, though. Mac OS filesystem supports extended attributes, and while rdiff-backup will handle them if you find a suitable xattr library for it, you need to find one yourself separately. The "original" xattr on SourceForge is not Mac OS X compliant. Fortunately, there's a Mac OS X version - you can get it from http://undefined.org/python/. You'll have to drop it into the correct Python distro (rdiff-backup is written in Python) to have it picked up by rdiff-backup. In case you installed rdiff-backup through Fink, the correct location is Fink's Python 2.4 libraries, namely /sw/lib/python2.4/site-packages/xattr.

Finally, if you use FileVault, you again need to prepare a FileVault equivalent on your backup volume. I only did a solution that works for a single user (me) who's logged in during the backup. Basically, my backup script will mount the encrypted disk image on the backup volume, backup into it, then unmount it. Of course, I could have just backed up the encrypted image file from one volume to the other, however I think the rsync diff algorithm wouldn't be too efficient on such a high-entropy content. So, it looks like this:


...
mkdir /Volumes/Backup/Users/aszegedi

hdiutil attach /Volumes/Backup/Users/.aszegedi/aszegedi.sparseimage \
-owners on -nobrowse \
-mountpoint /Volumes/Backup/Users/aszegedi

rdiff-backup /Users/aszegedi /Volumes/Backup/Users/aszegedi

hdiutil detach /Volumes/Backup/Users/aszegedi
...


Of course, it is also important to NOT backup the original /Users/.aszegedi/aszegedi.sparseimage!

Aside from these finer points, the rest of the script much resembles the one presented on Carbon Copy Clone site, except rdiff-backup being used in place of ditto, the /Users directory is not backed up blindly to avoid backing up the FileVault disk image. There is however, similar to CCC a separate invocation for every backed up directory, recreation of root symlinks in the backup volume following the backup, as well as a bless to make the backup bootable.

Took me "just" two evenings to sort it all out :-)

No comments: