Tuesday, February 05, 2008

Time Machine + FileVault experiences

I was reluctant to use Leopard's Time Machine "set it and forget it" backup because I also use FileVault (which basically mounts an AES-encrypted disk image in place of your home folder). The web was full of warnings how Time Machine does not work with FileVault, or does but it only backs up your home folder only when you log out, and you lose the ability to restore individual files through the GUI and need to fiddle with manually mounting the backed up images if you want to fish out something from them. Seeing however how I was getting undisciplined with my manual backup routine, I decided it can't be worse than having no backups, and went ahead and gave it a try.

At first, I was surprised to see that contrary to advertised, it did actually back up the encrypted disk image that hosts my home folder. It did it every hour. Every hour, it'd push the 30GB disk image over to the other drive. That filled it up, well, rather quickly.

Digging around, it turns out that the reason for this is that I kept using the Tiger-created FileVault, that uses a single file for the disk image. And Time Machine will happily back it up. 30 GB/hour.

So, next step was trying to upgrade to Leopard FileVault format, which uses a new "sparsebundle" disk image format, which is basically a folder with 8-MB files called "stripes" that hold the contents of the disk, plus some other files for tracking what's where. The ugly part of it is that in order to "upgrade" FileVault, you have to actually turn it off first (so it unpacks your disk image contents on the main filesystem), then re-enable it. I left it to decrypt over night (it probably only took an hour, but I left it there and went to sleep), then re-encypted in the morning (which took 40 minutes for 15GB of content). And then a secure wipe of the free disk space.

An immediate enormous benefit is that my disk image shrunk from 30GB to 15GB. That's right: my old disk image took 30GB even when it was hosting only 15GB of content, and no amount of compacting would've taken it lower. And it wasn't because of filesystem slack - inspecting the image with Disk Utility showed that there's indeed 15GB in there reserved but not used.

Now it's only 15GB, as I would expect it to be, with another 15GB reclaimed on my HDD. Hooray.

Another enormous benefit is that I no longer have Time Machine push 30GB over the FireWire every hour. Whenever I log out though, FileVault will compact the disk image (as it did in Tiger), and then Time Machine will back up - only those 8-MB stripes that actually changed, so the process is rather quick.

It is easy to understand why doesn't Time Machine back up the FileVault home directory while it's mounted - it would be too easy to back it up in inconsistent state as data is shuffled across stripes. Of course, I wish Apple engineers had more time to think about this, and solved it in a smarter way. I myself could tell them two better ways to handle this:

One: add a shadow file to the disk image while backing up to hold concurrent changes, merge changes into the image file upon backup finish. The underlying BSD foundation of the OS supports this. It would, however, probably create a perceptible temporary freeze of the system while the changes in the shadow file are merged with the disk image.

Two: create a similar encrypted disk image on the backup drive, mount it when the backup reaches the home folder, and just perform the whole Time Machine backup procedure between two disk images. I actually had a homegrown solution that did precisely this using rdiff-backup back on Tiger. Actually, when I first heard of Time Machine, I sort of hoped Apple will base it on rdiff-backup, and use this method to handle FileVault accounts.

rdiff-backup has the advantage that it can incrementally back up small changes in large files using the rsync algorithm (Time Machine copies whole modified file each time), and my method also preserved this incremental backup property in FileVault accounts, on a per-file basis, preserving both filesystem and backup semantics. I guess they lacked one smart guy in the engineering division for Time Machine, who was probably busy helping the iPhone division make their deadline... Oh well.

Anyway, now with tolerably fine-grained FileVault backups, I'm happy. Yes, I need to log out in order for my home folder to get backed up, but I was doing this for a while anyway, using CCC or Disk Utility to copy the whole internal disk to external. I used to do backups once a week; now I get automatic backup of the system every hour (which could come in handy if ever, say, a software install goes awry; never happened on Mac with me before though), and automatic backup of my home directory whenever I log out (which is not less frequent than once per week). Of course, the majority of my machine's state change happens in my home folder, so having its backup be more frequent than the system backup would of course be preferred, but such are compromises - I can't afford to run without FileVault.

(You might ask what happened with my homegrown rdiff-backup solution? It fell victim to my switch from a PowerPC to Intel Mac, as it would've required me to recompile a bunch of GNU stuff from source (rdiff-backup and its paraphernalia) which I didn't have time to do at the time of the switch, so it fell into oblivion...)

11 comments:

Matthew Phillips said...

Thanks for positing that info. I am about to switch to time machine backups and am in exactly the same situation, so knowing about the Tiger FileVault thing will save me a lot of time.

Cheers,

Matthew.

Anonymous said...

Nice post, thanks!


-Tom

KP said...

If you install MacPorts, you can then install rdiff-backup with one command...

Guy said...

As Kevin points out above, installing rdiff etc. should be a breeze using MacPorts.

I'd appreciate it if you could post your home grown solution for rsyncing the disk images.

Thanks,
Guy

Attila Szegedi said...

As per Guy's request, here's how I was doing a rdiff-backup in disk images:

1. create an encrypted disk image in the same location on the backup volume as it is on the primary volume (in my case /Users/.aszegedi)
2. Mount it once through Finder to force it to ask for password and make sure you tell it to store the password in Keychain (that way hdiutil will automatically get the password)
3. In your backup script, you put:

mkdir /Volumes/Macintosh\ HD\ 1/Users/aszegedi
hdiutil attach /Volumes/Macintosh\ HD\ 1/Users/.aszegedi/aszegedi.sparseimage -owners on -nobrowse -mountpoint /Volumes/Macintosh\ HD\ 1/Users/aszegedi

rdiff-backup /Users/aszegedi /Volumes/Macintosh\ HD\ 1/Users/aszegedi

hdiutil detach /Volumes/Macintosh\ HD\ 1/Users/aszegedi

(Note that since my backup volume is also named "Macintosh HD", it's mount name under /Volumes is "Machintosh HD 1". Also, you'll want to use your login name instead of "aszegedi"...

That should be it.

Attila Szegedi said...

Additionally, this script was last used on Mac OS X 10.4. Under 10.5 you'll probably have a *.sparsebundle instead of a *.sparseimage.

Guy said...

Do you mean adding these to a standalone (cron?) script, or is there some way of customizing the TM backup script?

Attila Szegedi said...

Standalone - as I said, I used it on 10.4, which didn't have Time Machine. I never got to integrate it under cron, mind you; I just wrote a script, put it into the path, and got into habit of running it from Terminal whenever I left the machine for a while :-)

I'm not aware that there are any scripting/customization abilities in Time Machine.

Unknown said...

Or option 3: when you're logged in, Time Machine could backup the unencrypted files just like on a normal system. You'd be able to browse through your Time Machine backup just like any other system. After all, you're probably more worried about losing your laptop in public than someone breaking into your home and stealing hard drives.

If you're worried about having an unencrypted version of your files on a hard drive at home, maybe the Time Machine backup itself could be stored on an encrypted disk image.

lucidsystems said...

You may be interested in LBackup which supports backups of FileVault enabled accounts.

Attila Szegedi said...

Thanks for the suggestion; I'll look into it. In the meantime, I have discovered CrashPlan, which can also back up a FileVault of a single logged-in user (that's good enough for me). It also supports multiple destinations, so now I use Time Machine + CrashPlan, and CrashPlan is backing up to both another computer and an external drive ('cause you can't have enough backups)