zfs-auto-snapshot.py

"It's simple enough to schedule creating a recursive snapshot of your system every 15 minutes. If you keep all of these snapshots, your pool fills up, however. Automated snapshots need rotating and discarding just like backup tapes."
—Lucas and Jude, FreeBSD Mastery: ZFS

Like any other piece of software (and information generally), this script comes with NO WARRANTY.

Public domain.

What is it?

zfs-auto-snapshot.py is a portable, minimal replacement for the standard snapshot automation shell script for ZFS on Linux. It creates and deletes ZFS snapshots at a frequency that you define.

It is written in Python for compatibility between both Linux and FreeBSD. Other operating systems have not been tested. Use at your own risk.

Installation

zfs-auto-snapshot.py is a standalone script that is copied to each machine where it is to be used. For example:

wget -N https://su.bze.ro/software/zfs-auto-snapshot.py
sudo install -m0700 ./zfs-auto-snapshot.py /usr/local/bin

(Updated, 2020-01-27) A version converted to Python 3 syntax is now available:

wget -N https://su.bze.ro/software/zfs-auto-snapshot.py3
sudo install -m0700 ./zfs-auto-snapshot.py3 /usr/local/bin

zfs-auto-snapshot.py requires Python. Check your operating system's documentation to see how to install Python if it is not already present on your machine.

Command-line Options

Notes

Unlike other ZFS management tools, zfs-auto-snapshot.py does NOT accept datasets as arguments or support any recursive operations. This script only performs the following:

zfs-auto-snapshot.py does not interfere with your other ZFS snapshots. In order for a snapshot to be pruned by zfs-auto-snapshot.py, it must start with "zfs-auto-snap-label", where label is a string you define. You can specify different labels for different categories of snapshots. Labels must be alphanumeric.

zfs-auto-snapshot.py creates new snapshots before deleting old ones. If you define --keep=N and already have N snapshots, you will briefly have N+1 snapshots before the oldest snapshot is deleted. This way you avoid having fewer than N snapshots. Use --delete-only if you wish to only remove snapshots.

Snapshots created by zfs-auto-snapshot.py are timestamped with the current date and time in Coordinated Universal Time (UTC). To avoid confusion, snapshot timestamps end in a 'Z' for "Zulu".

If zfs-auto-snapshot.py would create a snapshot with a duplicate snapshot name and timestamp, it will die with error code 100. Do not run or schedule zfs-auto-snapshot.py to run more often than once a minute.

Usage

Interactively as root, or with sudo or doas, run zfs-auto-snapshot.py. For example, to create a snapshot named "milestone" and keep the 10 most recent milestone snapshots:

# zfs-auto-snapshot.py --keep=10 milestone

Typically, zfs-auto-snapshot.py runs from crontab. The following example crontab will create and maintain a rotating set of five categories of snapshots with varying frequencies and retention values:

PATH="/command:/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin"
# min    hr   dom mon dow user cmd
15,30,45 *    *   *   *   root zfs-auto-snapshot.py --keep=4  frequent
0        1-23 *   *   *   root zfs-auto-snapshot.py --keep=24 hourly
0        0    *   *   1-6 root zfs-auto-snapshot.py --keep=7  daily
0        0    *   *   0   root zfs-auto-snapshot.py --keep=5  weekly
1        0    1   *   *   root zfs-auto-snapshot.py --keep=6  monthly

Starting from the top, this crontab defines your PATH environment variable then runs zfs-auto-snapshot.py: at 15, 30, and 45 minutes after every hour every day keeping the most recent 4 snapshots; hourly from 1 AM to 11 PM every day keeping 24; daily at midnight Monday through Saturday keeping 7; once a week at midnight on Sunday keeping 5; and at 12:01 AM on the first of every month keeping 6.

You can write this crontab to /etc/cron.d/zfs-auto-snapshot and begin creating snapshots on your system immediately. Remember to run chown root /etc/cron.d/zfs-auto-snapshot.

zfs-auto-snapshot.py can be used with other scheduling mechanisms. Be sure to define a usable PATH environment variable in your crontab/scheduler. This is not set in the script itself because Python can leak memory when manipulating os.environ on FreeBSD.

zfs-auto-snapshot.py will not create snapshots on any dataset that does not have the "com.sun:auto-snapshot" property set to "true", so be sure to check for this property on the datasets you want to automatically snapshot:

# zfs get com.sun:auto-snapshot

NAME                PROPERTY               VALUE  SOURCE
rpool               com.sun:auto-snapshot  -      -
rpool/ROOT          com.sun:auto-snapshot  -      -
rpool/ROOT/default  com.sun:auto-snapshot  false  local
rpool/boot          com.sun:auto-snapshot  true   local
rpool/home          com.sun:auto-snapshot  true   local
rpool/swap          com.sun:auto-snapshot  false  local
rpool/var           com.sun:auto-snapshot  -      -
rpool/var/log       com.sun:auto-snapshot  -      -

In the above example, only rpool/boot and rpool/home will have snapshots created and pruned automatically.

To start snapshotting a new dataset with zfs-auto-snapshot.py, set this property to "true" on the dataset. Example:

# zfs set com.sun:auto-snapshot=true rpool/mynewdataset

No changes need to be made to your crontab to begin snapshotting the new dataset.

To stop snapshotting a dataset, set "com.sun:auto-snapshot" to "false".

Why zfs-auto-snapshot.py is right for you

zfs-auto-snapshot.py is a good choice for your snapshot solution if you:

Why zfs-auto-snapshot.py is not right for you

zfs-auto-snapshot.py is NOT a good choice for your snapshot solution if you:

Other uses

Reduce the number of existing snapshots with --delete-only. This will not create new snapshots. To remove all hourly snapshots except the most recent one:

# zfs-auto-snapshot.py --delete-only --keep=1 hourly

Remove all of your automatic ZFS snapshots by specifying --keep=0. This is probably not what you normally want to do, so to coerce zfs-auto-snapshot.py to remove them anyway, use --force.

For example, to delete all your hourly snapshots:

# zfs-auto-snapshot.py --delete-only --keep=0 hourly --force

BE CAREFUL: If you want to delete every snapshot before or after switching to zfs-auto-snapshot.py, you can delete all snapshots for a given dataset by running zfs destroy with '%' as the name of the snapshot. (Technically, '%' is the ZFS snapshot range delimiter. Sets of snapshots can be selected using the syntax "zpool/dataset@first%last". Instead of naming a first or a last snapshot, use '%' by itself to define the set of all snapshots.)

For example, to delete all snapshots for rpool/dataset:

# zfs destroy -v -n rpool/dataset@%

would destroy rpool/dataset@MyManualBackup1
would destroy rpool/dataset@zfs-auto-snap-hourly-2018-12-29_0100Z
would destroy rpool/dataset@zfs-auto-snap-hourly-2018-12-29_0200Z
would destroy rpool/dataset@zfs-auto-snap-hourly-2018-12-29_0300Z
would destroy rpool/dataset@zfs-auto-snap-hourly-2018-12-29_0400Z
would destroy rpool/dataset@zfs-auto-snap-frequent-2018-12-29_0415Z
would destroy rpool/dataset@zfs-auto-snap-frequent-2018-12-29_0430Z
would reclaim 8K

This command shows you what would be destroyed. To actually delete every snapshot, remove the "-n" flag.

Why another ZFS snapshot script?

I used to use a shell-based zfs-auto-snapshot script and I loved how easy it was to set up — just plug it into cron! — but it and its /etc/cron.(hourly|daily|weekly) scripts never seemed to work right: it only snapshotted one dataset automatically, and it wasn't even one with the "com.sun:auto-snapshot" property set to true. I had to feed it the whole list of every dataset I wanted to snapshot, which got to be a chore as I added and changed my datasets. I figured I was doing something wrong. Even its name, "zfs-auto-snapshot", implied that I shouldn't need to manually list every dataset, so I tried to read some of the code to understand it and it contained things like this:

# Check whether the pool name is a prefix of the dataset name.
if [ "$iii" != "${iii#$jjj}" ]
then
    print_log info "Excluding $ii because pool $jj is scrubbing."
    continue 2
fi

I gave up trying to understand it and switched to zfsnap, which was (at least at the time) going through a major rewrite. It worked OK but I found I had trouble remembering how to configure a rotation schedule and have my systems adhere to it based on its name-based retention policy: to do weekly snapshots on my laptop, for example, I needed to make sure my laptop was powered on and not scrubbing at the exact same time each week. I wanted something more flexible. zfsnap, too, needed a list of all my datasets to work. I wanted something more automated.

When I started looking into a long-term snapshot management strategy for FreeBSD hosts, I gave the zfs-auto-snapshot shell script another shot. Right away I discovered that FreeBSD's getopt(1) is incompatible with it since it is only tested on Linux. Back to the drawing board.

In their book on the subject, Lucas and Jude suggest zfstools, which is in the FreeBSD ports tree, but it is written in Ruby. In order to use zfstools I'd need all of my machines to install yet another scripting language, and since I don't have any other software that depends on Ruby, I'd end up maintaining and patching full Ruby installs just to run one Ruby gem, which is overhead I wanted to avoid if I could.

Since I use Ansible for system deployment and management, my machines are going to wind up with Python on them, so I opted to write a Python script to create and rotate snapshots, simply. It had to be simple enough for me to write (and still be able to read six months later), and it had to adapt to my ZFS setup as I added new datasets without having to worry about updating my crontabs after every change.