Making a backupbot

I've written a Bash script that makes backups of websites and, optionally, databases: backupbot. It can make a backup of a single website directory and/or database or back up multiple directories and databases defined in a file.

The README file documents what the script does, so I won't repeat that here. Instead, I want to talk about what I learned while writing the script.

Keep it simple

My Bash scripts tend to grow over time and become unwieldy. I often add things that aren't really all that useful and make the code difficult to maintain.

To give an example, the first version of the script checked if the directory in which backups should be stored exists, and if not offer to create the directory. That's fine, but then I also needed to consider that the ownership of the new directory may need to be changed – the script currently needs root priveleges but the destination directory may need to be owned by a particular user. That's still managable, but it already felt like feature creep. Is offering to create the destination directory really worth the extra lines of code?

Things started to get quite complicated when I added an option to read data from a file. I added this option so that I would be able to back up all my websites via a cron job. That doesn't play nicely with the feature that offers to create the destination directory. If the "wrong" arguments are used the cron job would get stuck because it would be waiting for user input. A possible solution was adding a 'quiet' option, but how would I then handle the directory ownership issue? And what if a user runs the script via a cron job without the --quiet option? The script would need to be quiet by default, but then I would need to add a --not-quiet option instead.

It nicely illustrates how a relatively small feature can quickly result in a lot of complexity! I think I've made a wise decision by stripping features like these in the latest version of the script. If the destination directory doesn't exist the script now simply exits with an error.

Parsing all but the last argument

Related to my urge to add ever more functionality to scripts is that I got the hots for command line options. I try to limit them as much as I can, and in the case of the backupbot I think I found a sensible balance between functionality and complexity (there are only eight of them!).

Parsing the arguments also proved to be a bit of a challenge. I normally parse options with a standard while loop:

while [ "$#" -gt 0 ]; do
  case "$1" in
    --dest=*)
      dest="${1#*=}"
    ;;
    --database=*)
      database="${1#*=}"
    ;;
    *)
      printf '%s\n' "Error: invalid argument ($1)"
      print_help
    ;;
  esac
  shift
done
        

The loop checks if the number of command line parameters ($#) is greater than zero. If so, it checks if the first parameter ($1) is --dest, --name or something else (*). It then shifts the parameters and, if $# is still greater than zero, looks at the new first parameter.

The initial script had a couple of optional parameters and one that was required: the directory that should be backed up. So, a command could look like this:

backupbot --dest=/home/example/backups /var/www/example.com
        

As the last parameter is required I didn't want to use an option such as --directory. After all, the directory wasn't an option: it was required. But how do you then parse the arguments?

The answer is stupidly simple. Instead of checking if $# is greater than zero you can check if it is greater than one:

while [ "$#" -gt 1 ]; do
  ...
done
        

I then decided that the directory backup should be optional after all. The script should be able to make a backup of a database only. At the same time I still wanted to avoid adding a --directory option.

The difficulty with parsing the arguments now was that the last parameter may or may not be the directory to be backed up. Checking if $# is greater than one therefore no longer made sense. The solution I came up with is to check for any invalid arguments (--*) and to assume that the last argument (*) is the directory:

while [ "$#" -gt 0 ]; do
  case "$1" in
    --dest=*)
      dest="${1#*=}"
    ;;
    --database=*)
      name="${1#*=}"
    ;;
    --*)
      printf '%s\n' "Error: invalid argument ($1)"
      print_help
    ;;
    *)
      directory="${1#*=}"
    ;;
  esac
  shift
done
        

So, the script first parses any arguments that start with a double dash (including any invalid arguments) and then checks if there are any other arguments. I doubt this is the best and most elegant solution, but it works.

Reading data from a file and ignoring comments

I had never tried parsing data from a file. This turned out to be very easy.

The input file for backupbot should have five comma-separated fields, most of which can be empty. The example file looks like this:

# Full backup of example.com
example.com,/var/www/example.com/public_html,example_com_db,example,/home/example/backups
# Backup of the example.com database only
example.com,,example_db,example,/home/example/backups
# Backup of the example.com database only, using default values for 
# the owner and destination
example.com,,example_com_db,,
        

The five fields are read into a while loop as variables ($f_name, $f_directory, $f_database, $f_owner and $f_dest). They can then be parsed as normal. For instance, in the below example I'm assigning $f_directory to $directory and then call a function named validate_directory:

while IFS=, read -r f_name f_directory f_database f_owner f_dest; do
  directory=$f_directory
  validate_directory
  ...
done
        

So far so good. But what if you want to ignore blank lines or lines that start with a #-symbol in the input file (so that you can comment out individual backup jobs)? I found a good solution for that on Stack Exchange:

while IFS=, read -r f_name f_directory f_database f_owner f_dest; do
  case $f_name in
    ''|\#*)
      continue
    ;;
  esac
  ...
done
        

The one downside is that the first field in the input field ($f_name) can't be blank. The script will skip a line like this:

,/var/www/example.com,example_com_db,,
        

It's not a major issue but something that does need to be fixed (as the 'name' option is optional).

Changing directory with tar

The first version of the script made backup directories using tar -czf "$name" "$directory" (where $name is the name of the .tar.gz file and $directory the directory to be backed up). That was ugly when $directory included the path to the directory. For security reasons tar strips the first leading stroke when a full path is provided. So, when you back up the directory /var/www/example.com the files in the tarball will all start with var/www/example.com.

My initial workaround was to run the script from the directory below the one that needs to backed up. For instance, to back up /var/www/example.com I would first change to the /var/www directory and then execute the script. That obviously didn't work when I added the option to read multiple backup jobs from a file.

To my delight I found that tar has a -C option that changes directory before it creates a tarball. It works like this:

tar -C "$directory" -czf "$name" .
        

If $directory is /var/www/example the files in the tarball will all start with /example.com/. Neat!