I often say there are only two types of data: data that is backed up, and data that is waiting to be lost.

Even if you’re the most contentious hax0r who always backs up your data, if your backups are stored in the same physical location as the source data, then your data is still “waiting to be lost” in the event of a fire, flood, theft, or other disaster. By combining Amazon’s low-priced S3 (Simple Storage Solution) Cloud-based storage solution with some excellent some open source backup tools, you can now be more prepared than ever without spending a fortune.

This how-to demonstrates how I combined the following tools to automate my off-site backups:

  • Amazon S3: cheap, secure, redundant, off-site storage service
  • AutoMySQLBackup: free software to create backups of MySQL databases
  • Duplicity: free software that does smart backups to remote locations
  • GPG: allows encryption and signing of data for privacy
  • dt-s3-backup.sh: a slick shell script that ties all these tools together

Step 1: Set up your Amazon S3 Storage Bucket

I won’t walk through all the steps to do this, as Amazon makes it easy. Just sign up for their S3 service (you only pay for what you use), sign in, find the Security Credentials page and take note of your Access Key ID and your Secret Access Key. You’ll need them later. You should also set up an S3 Bucket to store your backups. Write down the name of your bucket for use in a later step.

Step 2: Download AutoMySQLBackup (optional)

If you don’t have any MySQL databases to back up, or you have your own preferred method of backup in your databases, you can skip this step. AutoMySQLBackup is a free utility that quickly and easily create dumps of your MySQL data – which we’ll back up to Amazon S3 in a later step.

Download AutoMySQLBackup from SourceForge and run the simple install.sh script to set it up. I followed this excellent blog post to help me get AutoMySQLBackup configured and working. I had to make a few minor changes because I’m using a more current version of AutoMySQLBackup and some of the variable names in the config file were different, but it’s pretty straightforward. Once you’ve got it backing up your databases, you’re ready to move on.

Step 3: Download and Install Duplicity

Duplicity is the program that does most of the heavy lifting in this situation. It manages the actual file backup (full or incremental), compression, encryption, and the file transfer to any number of off-site storage locations. Lots of documentation is available online, in case your needs differ from the ones explained here. As always, Google is your friend. :)

To install Duplicity if you’re running Fedora, RHEL, or CentOS, it’s as simple as doing:

yum install duplicity

For Ubuntu or Debian users, do:

apt-get duplicity

If you’re running some other flavor of Linux, refer to the Duplicity website for help installing.

Step 4: Create a GPG Key for Backups

Because you’re going to be transferring your precious data over the Internet, and storing it in an off-site location that shouldn’t be, but still technically could be, accessed by snooping Amazon employees or hackers, it’s best to encrypt your data before sending it… “to the CLOUD!” Seriously, those commercials are so annoying.

Even if you already have a GPG key, I recommend creating a separate one just for backups (which we’ll also store in a secure location later so you’re never stuck without the ability to decrypt your data later). Do:

gpg --gen-key

You can accept all the defaults, but make sure you use a passphrase when creating this key, since Duplicity will require it. After you’ve answered all the questions, the output should look something like this:

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
.++++++++++++++++++++..+++++++++++++++++++++++++++++++++++++++++++++++++++++++.+++++
.+++++..+++++.+++++++++++++++++++++++++++++++++++++++++++++.....>.++++++++++................................+++++
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
+++++.++++++++++..++++++++++...++++++++++...+++++.+++++..+++++.+++++..+++++++++++++++.+++
++++++++++++..+++++++++++++++..++++++++++..+++++++++++++++++++++++++...+++++..+++++>+++
+++++++>.+++++>+++++......................+++++^^^
gpg: key 1F6C9247 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   2  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 2u
pub   2048R/1F6C9247 2011-11-05
      Key fingerprint = FC81 D8E3 8090 EEE3 1D98  E000 045C D80E 1F6C 9247
uid                  Backup Key <backup@example.com>
sub   2048R/12D6A5B0 2011-11-05

Take note of your key’s public GPG Key ID, which is listed on the line where it says “key xxxxxxxx marked as ultimately trusted” (in this example, it’s 1F6C9247). You can also find your GPG key’s public ID with:

gpg --list-keys

which will spit out something like:

pub   2048R/1F6C9247 2011-11-05
uid                  Backup Key <backup@example.com>
sub   2048R/12D6A5B0 2011-11-05

You’ll see your key’s ID on the top row after the slash. Write it down (don’t worry, it’s not a security risk like a password) to refer to in the next step.

Step 5: Download and Configure dt-s3-backup Script

While trying to make all the aforementioned tools work together, a stumbled across a very cool script that already did it for me. This blog post explains the script, and the script itself is hosted on GitHub.

Download the script to your server (I put mine in /usr/local/bin) and then open it up in an editor. You’ll need to put the following in the appropriate locations inside the script:

  • AWS_ACCESS_KEY_ID: Your Amazon Access Key (duh!)
  • AWS_SECRET_ACCESS_KEY: Your Amazon Secret Access Key (double duh!)
  • GPG_KEY: Your GPG Key ID of the key you created in the previous step
  • ROOT: I changed this to just “/” so that I could back up anything on the system. You’ll pick the exact directories you want in a bit.
  • DEST: Since we’re backing up to Amazon S3, comment out the “file:” line, uncomment the “s3+http:” line, and put the name of the Amazon S3 bucket you created for backups in the first step. If your bucket name were “my.awesome.backups” then this line would be DEST=”s3+http://my.awesome.backups/”

Skip the INCLIST and EXCLIST options for now, and tinker with the STATIC_OPTIONS to your liking. These will simply be passed to Duplicity, so you can check the Duplicity docs for all the possilibities. I have mine set to STATIC_OPTIONS=”–full-if-older-than 4W” which means my backup (which I run daily) will do incremental backups unless it’s been 4 weeks, in which case it will do a full backup. I also kept the default CLEAN_UP_TYPE and CLEAN_UP_VARIABLE settings. Again, refer to the Duplicity docs for other options.

Finally, I also tinkered with the Logfile settings and Email Alert settings.

Step 6: Choose which directories to include and exclude

Use the INCLIST and EXCLIST sections of the dt-s3-backup.sh script to list which directories you want to include and exclude while doing your backups. Examples are shown in the script. Make sure that whatever directory you used to store your database backups with AutoMySQLBackup is included. If you want hidden directories excluded, be sure to include them. The following are my lists:

INCLIST=(  "/www/" \
           "/etc/" \
           "/home/" \
           "/root/" \
           "/usr/local/bin/" \
           "/usr/local/backups/db/" \
        )

 EXCLIST=(   "/www/logs" \
            "/etc/selinux" \
            "/home/*/Download/" \
            "/root/*/Download/" \
            "/home/*/.*/" \
            "/root/.*/" \
            "/home/*/logs" \
            "/home/*/Maildir" "/home/*/mail" "/root/Maildir" "/root/mail" \
        )

These settings work for me, but there’s no guarantee they will work for you. It’s your data, so you should completely understand what is and isn’t going to be backed up.

Step 7: Do a Test Run

To test things out, find the following line in the dt-s3-backup.sh script and uncomment it (remove the #):

#ECHO=$(which echo)

As explained in the comments, this will run the script in test mode, which will spits out the full Duplicity command and send it to the email address you set up in the Email Alert settings.

Save your edited version of the script and run it with:

dt-s3-backup.sh --backup

Because it’s in test mode, it should think for a bit and then email you some output, which includes the full command that will be passed to Duplicity. If everything looks good, comment the #ECHO line out again, and go for it:

dt-s3-backup.sh --backup

Depending on many factors (the amount of data you’re backing up, the speed of your system, the speed of your connection to Amazon S3, the phase of the moon), you’ll have to wait for a bit. My system takes about 5 minutes to run a full backup.

If something goes wrong, check all your edits, and check the links to the other blog posts I’ve included. I won’t be any help answering support questions in this thread, because I’m not the author of any of these applications. :)

Step 8: Check Your Files

Assuming your backup worked, you can ask Duplicity to list all the files in your backup with:

dt-s3-backup.sh --list-current-files | more

Keep in mind that these will count as a requests against your Amazon S3 allowance. You get a bunch of free ones, but managing your Amazon bill is completely your responsibility.

Other options for dt-s3-backup.sh are available in its README file. I recommend experimenting with them until you’re familiar with the ones you’ll need.

Step 9: Automate

Once everything is working as you want it, don’t forget to create cron jobs for AutoMySQLBackup and dt-s3-backup.sh, I dump my databases nightly, and I do an incremental backup with dt-s3-backup weekly. Use whatever settings work best for you.

Step 10: Provide Feedback

I always welcome your feedback, especially if you have suggestions for making the process in this article easier to do or understand. If you have a different backup method that works for you, please feel free to share it. Because I’m not the author of any of these utilities, however, I can’t provide support in using them. Check the links I’ve provided for support, or contact the application authors directly if you’re having trouble.

Good luck moving your data from “waiting to be lost” to “backed up.” I know I sleep better knowing I’m better prepared to deal with disaster!