s3cmd

Following on from my blog regarding s3cmd I've put together this demo of how I used s3cmd to create an automated back up of my MongoDb.

Your situation may well differ from mine so to clarify:

  • I'm running a single instance on Amazon EC2 with MongoDb installed
  • I have a Amazon S3 set up with a bucket for backups
  • I want to back up the MongoDb as part of a scheduled task without shutting it down

Note: The recommended practice is to use multiple instances of MongoDb where 1 is set to be the replica set of another, the main advantage is that if your primary fails you can automatically switch to the replica. In my case I just want a cheap hosting solution where I can experiment on small projects and host my blog, having a second instance of MongoDb in my case is over kill.

Getting Started

I've already got MongoDb up and running and I've connected to my instance over ssh. I intend to:

  1. Install s3cmd
  2. Create a bucket and directory for backups on S3
  3. Get your Authentication details from Amazon
  4. Configure s3cmd
  5. Create a script to back up my database using MongoDump
  6. Schedule backup using cron

Install s3cmd

On Amazon EC2's Linux s3cmd is available through apt-get, although it should be noted that the developer recommends downloading and installing the latest version. I did find however that I could install a very recent version (1.0.0) through apt-get so didn't feel the need to go through the extra steps to get anything more recent.

sudo apt-get install s3cmd

Create a bucket and directory for backups on S3

Log in to your Amazon S3 account, click through to Amazon Management Console and select the S3 service, click to 'Create Bucket', give it a unique name (good idea to prepend your name to help guarantee this) and then 'Create Folder' which I have named 'backups'

Get your Authentication details from Amazon

Whilst still logged into AWS Amazon click 'Security Credentials' from the drop down by our name top right, under Access Credentials you will find a tab titled 'Access Keys' where you can find your 'Access Key ID' and 'Secret Access Key' (click the link 'Show' to display the Secret Access Key)

Configure s3cmd

Back to our server instance, we use the details from Get your Authentication details from Amazon. As we are going to use cron we need to ensure that we are logged in as the root user (the same as cron) so that when it uses 's3cmd' it will have the same details we are about to configure.

sudo su
s3cmd --configure
# Access Key: [Enter the 'Access Key ID' from our Amazon Security Credentials]
# Secret Key: [Enter the 'Secret Access Key' from our Amazon Security Credentials]
# Encryption password: [If you want back ups to Amazon to be encrypted enter a pass phrase here, I left mine blank]
# Path to GPG program [/usr/bin/gpg]: [I wasn't using encryption so I left it as the default setting]
# Use HTTPS protocol [No]: [It says this will be slower but I wanted my encrypted during upload so I entered 'Yes']

Create a script which will back up my database using MongoDump

Based on the script I found in this tutorial I adapted it to fit my particular requirements

Break down of what the script is doing:

  1. MONGO_DUMP defines the path to where mongodump is installed AND my admin username and password for mongodb which I added to protect it.
  2. TIMESTAMP is created to ensure our export and backup is unique
  3. S3_BUCKET_NAME is the name of my bucket on S3
  4. S3_BUCK_PATH is the directory I created to hold my back ups, if you are saving the to the root of the bucket then this would be blank
  5. Run the dump by call $MONGODUMP_PATH
  6. Rename the dump
  7. Tar the dump
  8. Use s3cmd to upload it to Amazon S3
  9. Clean up by deleting the directory and tar we created
MONGODUMP_PATH="/usr/bin/mongodump --username ##### --password ########"
TIMESTAMP=`date +%F-%H%M`
S3_BUCKET_NAME="jimib-backups"
S3_BUCKET_PATH="mongodb"
 
# Create backup
$MONGODUMP_PATH
 
# Add timestamp to backup
mv dump mongodb-$TIMESTAMP
tar cf mongodb-$TIMESTAMP.tar mongodb-$TIMESTAMP
 
# Upload to S3
s3cmd put mongodb-$TIMESTAMP.tar s3://$S3_BUCKET_NAME/$S3_BUCKET_PATH/mongodb-$$
 
# Delete everything when we're finished
rm -R -f mongodb-$TIMESTAMP
rm mongodb-$TIMESTAMP.tar

Note: I'd like to extend this script a little further so that it will delete old records from the Amazon S3 bucket automatically once it has finished uploading.

Schedule backup using cron

The last step is set our script so that it will run as part of a schedule. We are on Linux so we can use cron. Cron will be running already under the root user which you can check using the following command

ps aux | grep cron

Checking we are logged in as root as well we set about configuring cron

sudo su
crontab -e

It will prompt you to select your editor based on the available ones located. After selecting an editor we need to add a line like this:

0 0 * * * /home/jimib/scripts/mongo-backup

In the example above I've saved my back up script to '/home/jimib/scripts/mongo-backup' and I have stated that I want it to run on the '0th min of the 0th hour or every day of every week'. There's a decent explanation of this format on wikipedia, pay particular attention to the Special Characters which allows for greater flexibility of the timings of your schedule.

As an example

 
# Run myscript every 15mins
0/15 * * * * /myscript
# Or
0,15,30,45 * * * * /myscript
 
# Run myscript at midnight every weekday night
0 0 ? * MON-FRI

Note: Question mark is a non-standard character and exists only in some cron implementations. It is used instead of '*' for leaving either day-of-month or day-of-week blank.