Friday, January 2, 2009

Debian Server Backup Script using Gmail



The third level in backup security and one of the most important is to be able to store backups in a place different than the production place, well known as Keep backups off-site. Currently taking advantage of free services like gmail, it´s much easier to fulfill this third level of backup security and to have all your backups available through an internet connection. Here I show a homemade script that does the following steps.

  1. Compress your web server, database and svn repository
  2. Encrypt compressed files with openssl
  3. Send those files to a gmail account 
  4. Clean gmail account removing old backups


1. Shell script for backup compression and encriptation

a) Compression

This is the simple part. We will backup the SVN repository compressing the repository directory, and we will do the same for the web server, we will compress the www root directory. To backup the Mysql database we will first do a mysqldump and then we will compress the dump file. To configure de compression process we have to define the following variables.

  • Backup Path: The path that will be compressed

  • Archive Path:  The path where we want to store the last backups in your server.

  • Backup History: How many backup days you want to keep in your server.

b) Encriptation

After having generated the backup files .tgz we will apply openssl encriptation so we can send this files through internet and be almost sure that no one will be able to access them.

c) Script code

#!/bin/bash
# backups

# command paths
TAR=/bin/tar
DATE=/bin/date
FIND=/usr/bin/find
ECHO=/bin/echo
MYSQLDUMP=/usr/bin/mysqldump
RM=/bin/rm
OPENSSL=/usr/bin/openssl
PYTHON=/usr/bin/python
SENDGMAIL=/etc/cron.daily/gmailSend.py
CLEANGMAIL=/etc/cron.daily/gmailCleanup.py

# script variables
DATECODE=`$DATE +%Y%m%d`
ARCHIVEDIR=<backupArchivePath>

BACKUPDIR1=<webServerPath>
ARCHIVE1=$ARCHIVEDIR/$DATECODE-wwwServerBackup.tgz
LOGFILE1=/root/backups/$DATECODE-wwwServerBackup.log
ERRORFILE1=/root/backups/$DATECODE-wwwServerBackup.err.log

BACKUPDIR2=<subversionPath>
ARCHIVE2=$ARCHIVEDIR/$DATECODE-svnBackup.tgz
LOGFILE2=/root/backups/$DATECODE-svnBackup.log
ERRORFILE2=/root/backups/$DATECODE-svnBackup.err.log

BACKUPDIR3=$ARCHIVEDIR/db.sql
ARCHIVE3=$ARCHIVEDIR/$DATECODE-dbBackup.tgz
LOGFILE3=/root/backups/$DATECODE-dbBackup.log
ERRORFILE3=/root/backups/$DATECODE-dbBackup.err.log


# Create log and error files if they don't exist and clean them
$ECHO > $LOGFILE1
$ECHO > $ERRORFILE1
$ECHO >> $LOGFILE1
$ECHO >> $ERRORFILE1

$ECHO > $LOGFILE2
$ECHO > $ERRORFILE2
$ECHO >> $LOGFILE2
$ECHO >> $ERRORFILE2

$ECHO > $LOGFILE3
$ECHO > $ERRORFILE3
$ECHO >> $LOGFILE3
$ECHO >> $ERRORFILE3

# Backup www
$TAR czvf $ARCHIVE1 $BACKUPDIR1 >> $LOGFILE1 2>> $ERRORFILE1
$OPENSSL enc -aes-256-cbc -e -in $ARCHIVE1 -out $ARCHIVE1.enc -pass pass:<password>
$RM $ARCHIVE1

# Backup svn repos
$TAR czvf $ARCHIVE2 $BACKUPDIR2 >> $LOGFILE2 2>> $ERRORFILE2
$OPENSSL enc -aes-256-cbc -e -in $ARCHIVE2 -out $ARCHIVE2.enc -pass pass:<password>
$RM $ARCHIVE2

# Backup database
$MYSQLDUMP -u<user> -p<password> --opt --all-databases > $BACKUPDIR3
$TAR czvf $ARCHIVE3 $BACKUPDIR3 >> $LOGFILE3 2>> $ERRORFILE3
$RM $BACKUPDIR3
$OPENSSL enc -aes-256-cbc -e -in $ARCHIVE3 -out $ARCHIVE3.enc -pass pass:<password>
$RM $ARCHIVE3

# Remove files more than 2 day old
$FIND $ARCHIVEDIR -type f -mtime +2 -exec rm -f {} \;

# Email files to gmail
$PYTHON $SENDGMAIL $ARCHIVE1.enc
$PYTHON $SENDGMAIL $ARCHIVE2.enc
$PYTHON $SENDGMAIL $ARCHIVE3.enc
$PYTHON $SENDGMAIL $ARCHIVE4.enc

# Clean gmail account moving all backup emails older than 60 days to the trash folder
$PYTHON $CLEANGMAIL

# Decript command
# $OPENSSL enc -aes-256-cbc -d -in $ARCHIVE4.enc -out $ARCHIVE4 -pass pass:<password>

2. Send the encripted backup files to your gmail account as Drafts

Using libgmail [sadly this library does not work anymore] makes sending files to a gmail account terribly simple, the only problem you need to face is the attachment file size limitation of the gmail account, that during my develepment process was something around 20MB. To solve this issue, we use the split linux tool to cut big files in 19MB chuncks, and then we email each piece.

gmailSend.py
import sys, os
import libgmail
import glob

def sendFile(filename):
  accountName = <account name>  accountPassw =  <account password>
  gmailAccount = libgmail.GmailAccount(accountName, accountPassw)
  try:
    gmailAccount.login()
  except libgmail.GmailLoginFailure:
    print "\nLogin failed. (Wrong username/password?)"
  else:
    print "Log in successful.\n"

    if gmailAccount.storeFile(filename, label='serverBackup'):
      print 'File stored successfully'
    else:
      print 'Could not store'

def main(argv):
  # Check file size < 19M  if os.path.getsize(argv[1]) <= 19000000:
    sendFile(argv[1])
  else:
    # Call split -d -b 19000000 argv[1] argv[1]+'.'    os.system('split -d -b 19000000 %s %s.'%(argv[1], argv[1]))

    # Call cat file.tgz.00 file.tgz.02 file.tgz.03 > file.tgz to compact splitted files
    # Go throug all files created from the split and send them
    filePieces = glob.glob('%s.*'%(argv[1]))
    for file in filePieces:
      sendFile(file)

if __name__ == '__main__':
  main(sys.argv)

3. Clean gmail account

After using the script during several months I ended using all the space of my gmail account so I had to do a manual clean up. To avoid this awful task I've added a final cleanup python script, responsible for moving all backup email drafts older than 60 days to the gmail trash folder. As gmail deletes all emails that have been in the trash more than 30 days, we will keep 3 months backup history.

gmailCleanup.py
import sys, os
import libgmail
import glob
import datetime

# This method parses the subject of the draft mail.
# It supposes that the subject of the email uses the backup script naming conventions
# Subject example: 'FSV_01 20081201-dbBackup.tgz.enc'

def ThreadSubject2Datetime(subject):
   items = subject.split(' ')
   if len(items) == 1:
      return None

   items = items[1].split('-')
   if len(items) == 1:
      return None

   date = items[0]
   return datetime.datetime(int(date[:4]), int(date[4:6]), int(date[6:8]))

# This script will move to the trash all draft emails older than 60 days
# As gmail deletes all email is the trash that have been there for 30 days,
# we will keep 3 months backup history
def main(argv):
   accountName = <account name>
   accountPassw = <account password>

   gmailAccount = libgmail.GmailAccount(accountName, accountPassw)
   try:
      gmailAccount.login()
   except libgmail.GmailLoginFailure:
      print "\nLogin failed. (Wrong username/password?)"
   else:
      folder = gmailAccount.getMessagesByFolder(libgmail.U_DRAFTS_SEARCH, allPages = True)
      for thread in folder:
         date = ThreadSubject2Datetime(thread.subject)
         if date:
            if date <= datetime.datetime.today() - datetime.timedelta(60):
            gmailAccount.trashThread(thread)

if __name__ == '__main__':
  main(sys.argv)

4. Known issues

  • The gmailCleanup.py script supposes the gmail account is one created only for backup porposes.
  • I'm sure that the shell script can be improved because I'm not an expert on bash scripting. You can use it as a start with script.
  • I'm missing the configuration of the cron deamon to execute the shell script whenever you want because I think this is well documented in internet.
  • Finally I think I should code all the process in python.

5.  Acknowledgments

Thanks to Javier Loureiro for the startup ideas around backups and gmail.

No comments:

Post a Comment