Online Backup?

Dear Lazyweb,

Recently, I have been scanning and shredding financial documents dating back to 2000. This has freed me of several file cabinet drawers worth of space, which is good. I am not sure that I will ever need DSL, electricity, and itemized credit card and cellphone bills going back 8 years, but it felt wrong to just shred them without keeping at least a digital backup.

Currently, documents are scanned onto the hard drive of a G5 iMac. They then get backed up to the TeraStation, which uses RAID mirroring to ensure that a good copy is always on two of its internal disks (as well as the Mac.) Because the data exists on three drives, this is an excellent insurance policy against hard drive crashes. This is not the greatest policy against fire, flood, or theft because everything resides in the same room of the house.

So my question to you, dear Lazyweb, is what should I be using for an online backup? I have several requirements:

1. It needs to work on a Mac. I don’t care whether it is a GUI or a shell script, but it must run on a Mac.

2. The data needs to be encrypted as it is sent out to the online backup and decrypted when it returns. It should never be sitting as “plain text” on a remote machine because this is perfect identity theft source material.

3. It needs to be incremental. The financial data itself is relatively small right now, but I will likely want to scale up this solution to other files, too.

4. Cheap-to-free is best. Bonus points if it works with Dreamhost (or any random shell account) because I am already a customer. There’s no way I’m paying $99/year for dot-Mac.

5. It needs to be reliable. I’ve seen a few hacky dot-Mac replacements that require you to recompile a special Apache server or patch your OS or whatever–things that will probably break when the OS gets bug fixes. I want a real solution that uses proper APIs, not funny hacks that I have to worry about whether or not they will break in the not-too-distant future.

Here are two backup systems I have previously (or are currently) using that fail to meet some requirements:

1. rsync over ssh. I use rsync to back up things around the house. It’s great, it’s incremental, it can be tunneled over SSH, and can be put on an automated schedule. It does not (to the best of my knowledge) store things in an encrypted format.

2. WebDAV. I have on occasion used Dreamhost’s WebDAV to mount a network drive and copy stuff to and from it, much like you would with an iDisk from dot-Mac. This can be done pretty easy from a shell script (by using some AppleScript glue to have the shell script tell Finder to mount and unmount the drive.) It’s convenient and even lets you get to the files from a web browser. Like rsync, it also does not encrypt things on the server side. I supposed I could maybe create a *.DMG encrypted disk image to store on the WebDAV disk and rsync to that, but something tells me that the efficiency of mounting an encrypted DMG over a broadband network share will be molasses-slow.

I have looked a little bit at Mozy, but their buzzword-heavy website makes it hard to figure out if this is what I want. Plus, I’d much prefer a free Open Source solution, as I feel I could trust it more than some company that gives me a black box app and says “trust us; everything’s encrypted.”

Thoughts? Suggestions?

Posted in: Dear Diary Questions

9 thoughts on “Online Backup?”

  1. You could always email it to yourself on a gmail account. Seriously, there is almost too much space on one account… And if even that’s not enough, you can make as many accounts as you want. There is also photobucket, if these files are image files.

  2. As easy as emailing something to yourself is for the quick file or two, the system breaks down when you’re dealing with several hundred (if not thousands of) PDFs. While it can sort of be automated (with the Unix mail command), keeping track of the incremental updates is going to require a bunch of custom code. Additionally because I’m hesitant to store them online as plaintext (e.g. anyone who hacks my Gmail or discovers a future security hole can download all of my financial data), they need to be encrypted at backup and decrypted on restore, which is a bunch more custom code.

    While I am probably capable of writing all of that code, I’d rather put my trust in code that others have written and (more importantly) have beta-tested. A system that’s been used by many others before me has a higher chance of being more reliable than something I craft myself.

  3. I think rsync is the way to go. I already use rsync over ssh for Dreamhost Mac OS X. This is only for website files, so I can’t guarantee that it won’t subtly mangle some OS X properties if you try to restore. But it should be close enough.

    I was about to say that encrypting files would by definition fuck up rsync’s relatively painless updates. But it seems that there is a project called rsyncrypto which allows you to trade weaker encryption for relatively local changes to encrypted files. Never tried it, this is just me Googling, but I might now that I’ve found this.

  4. I use Mozy, and it works as advertised.

    Have you thought about $1.60/GB, totally secure (they even guarantee HIPAA and S-Ox compliance), and works with Open Source tools (as the name would imply)…

  5. If I had your requirements I would use #2 but store the DMG on the LAN storage and have a chron job or the like upload a copy of the DMG overnight. Maybe have one DMG for each year to save bandwith if it is too big a file. (basically I’m trying to re-jigger my solution without paying for .Mac’s auto-iDisk synking)

  6. There’s an option as yet unmentioned here — why not store the sensitive files encrypted locally, and use rsync + ssh for the transport? It’s not like you have to refer to 8 years’ worth of financial documents very frequently, and it makes your data safer in the event of theft on your end. That’s how I do everything, and it’s worked like a charm so far for about 108G worth of backed-up stuff from 3 machines.

  7. @Fimmtiu: I’m trying to avoid a hybrid solution, if at all possible, to maintain simplicity. My local backups are rsync’ed between the machine with the files and a NAS backup. I like having the exact same files at both the workstation and on the NAS because it makes working with the NAS during restore (something I’ve had to do once already) much more easy. My options with local encryption are basically…

    1. Keep encrypted files on the workstation, sync directly to NAS, sync directly to Dreamhost. I like this, as it makes incremental backups easy, but the thorn here is that I’d like to leave the last year’s worth of data unencrypted on the local workstation because I often have to refer back to it. I’m really enjoying Leopard’s spacebar-preview for this and worry that decrypting each time might impede me enough that I use the system less and less frequently, going back to giant filing cabinets. I’d also like to expand this (eventually) so that 100+ GB of audio files are also backed up, and rather like the fact that everything’s encrypted, not just private data. (This kind of goes back to Schneier’s analogy of using postcards for all snail mail, but when you use an occasional sealed envelope, anyone monitoring your mail is going to notice that as being an interesting piece of correspondence.)

    2. Keep plain files on the workstation, sync encrypted to NAS, then mirror that online. I see this as, basically, being the same as syncing unencrypted to the NAS and syncing encrypted to the internet. I’d be using the same app, probably (Duplicity, as I mentioned recently.) I’d like to keep the NAS unencrypted–a direct mirror of the local machine–as I mentioned above. Restoration is easy, and if someone gains physical access to the one, they have physical access to the other, so if one’s encrypted and the other isn’t, it won’t matter.

    Rewinding a bit, I think I’ll have to make a 1b:

    1b. The more I think about this, I may be able to pull this off with an encrypted DMG image. I could mount it, type the password once, access all the files, then unmount it when I’m done. That whole DMG then rsyncs with Dreamhost. I just don’t know how well that it will incrementally back up with rsync/ssh. I know that rsync is block based, and presumably DMG files are block based because they’re just emulating a physical drive (right…?), so that just might work… I’ll have to perform a couple of experiments to make sure everything behaves the way I think.

Leave a Reply

Your email address will not be published. Required fields are marked *