Backup and Restore Your Repository

Despite the advent of new, more advanced technologies such as SSD drives for example, one thing regrettably still remains true - sometimes things can go not in the intended way.  Power outages, network connectivity dropouts, corrupt RAM, and faulty hard drives are just a few examples.  Those bring up, however, a very important topic: how to back your repository data up.

There are two types of backup methods available in Subversion.  Those are full and incremental backups.  A full backup of the repository includes storing all the information required to fully reconstruct that repository in the event of a disaster.  In other words, a full backup is the duplication of the entire repository directory.  Incremental backups, as the same suggests, constitute backups of only a portion of the repository data that may have changed since the previous backup.

Full Backups

SVN Hotcopy

The naïve approach in SVN to carry out a full backup allows to perform a "hot" backup without the need to temporarily disable all other access to your repository.  Note that doing a recursive directory copy (for example the Unix cp or Windows copy operations) brings the risk of generating a faulty backup. Performing the backup procedure in a certain order, in which database files can be copied, will only guarantee a valid backup copy. The "svnadmin hotcopy" command respects all details involved in making a hot backup of your repository. Its invocation is like this:

 > svnadmin hotcopy /path/to/your/repo /path/to/your/repo-backup

The resultant repository backup is a fully functional Subversion repository, which you can use as a replacement for your repository should you necessitate so.

Subversion offers additional tools and scripts related to the "hot" backup.  For example, the "tools/backup" directory of the Subversion source distribution holds a hot-backup.py script.  This script adds some backup management in addition to the "svnadmin hotcopy" procedure, allowing you to keep only the most recent configured number of backups of each repository.  It will automatically manage the names of the backed-up repository directories to avoid collisions with previous backups and will rotate off older backups, deleting them so that only the most recent ones remain.  Thus, you might want to run this program on a regular basis. For example, you might consider running hot-backup.py from a program scheduler (such as cron on Unix systems), which can cause it to run nightly (or at whatever granularity of time you deem safe).

SVN Dump

Another approach to perform a full backup of your repository is by using "svnadmin dump" command.  The advantage of "svnadmin dump" is that the format of the backup is flexible, as it is not tied to a particular platform, versioned filesystem type, or release of Subversion or the libraries it uses. However, the flexibility comes at a significant cost - restoring that data from the backup can take a long time, which is longer with each new revision committed to the repository.

Should you decide to use this approach, its application is as follows:

 > svnadmin dump /path/to/your/repo > /path/to/your/dumpfile

At the end of the "dump" process, you will have a single file (dumpfile in the above example) that contains all the data stored in your repository including all revisions. Note that "svnadmin dump" is reading revision trees from the repository just like any other reader process would (e.g., svn checkout), so it is safe to run this command at any time.

In order to restore a svn repository from a dumpfile, you can use "svnadmin load", which parses the standard input stream as a Subversion repository dumpfile and effectively replays those dumped revisions into the target repository:

 > svnadmin load /path/to/your/newrepo < /path/to/your/dumpfile

The result of the load is new revisions added to a repository, which is equal to making commits against that repository from a regular SVN client.

Incremental Backups

SVN Hotcopy

Starting Subversion version 1.8, "svnadmin hotcopy" accepts --incremental option and supports incremental hotcopy mode for FSFS repositories. In incremental hotcopy mode, revision data which has already been copied from the source to the destination repository will not be copied again. Thus, Subversion will only copy new revisions, and revisions which have changed in size or had their modification time stamp changed since the previous hotcopy operation. Unlike "svnadmin dump --incremental", performance of "svnadmin hotcopy --incremental" is often only limited to disk I/O. Therefore, the incremental hotcopy could lead to substantial performance improvement in making a backup of a large repository.

 > svnadmin hotcopy /path/to/your/repo /path/to/your/repo-backup --incremental

SVN Dump

Similarly to the hotcopy method, "svnadmin dump" also  accepts --incremental option. You can use the --revision (-r) option to specify a single revision, or a range of revisions, to dump. If you omit this option, all the existing repository revisions will be dumped, as discussed in the full backups sections.

The benefit of this incremental backup method is that you can create several small dump files that can be loaded in succession, instead of one large one:

> svnadmin dump /path/to/your/repo -r 0:1000 --incremental > /path/to/your/dumpfile
> svnadmin dump /path/to/your/repo -r 1001:2000 --incremental > /path/to/your/dumpfile2
> svnadmin dump /path/to/your/repo -r 2001:3000 --incremental > /path/to/your/dumpfile3

These dump files could be loaded into a new repository with the following command sequence:

> svnadmin load /path/to/your/newrepo < /path/to/your/dumpfile
> svnadmin load /path/to/your/newrepo < /path/to/your/dumpfile2
> svnadmin load /path/to/your/newrepo < /path/to/your/dumpfile3

Full or Incremental Backup?

Each of the two backup types and methods above has its advantages and disadvantages. The simplest is undoubtedly the full hot backup, which will always result in a perfect working replica of your repository. Should something bad happen to your live repository, you can restore from the backup with a simple recursive directory copy. Unfortunately, if you are maintaining multiple backups of your repository, these full copies will each consume as much disk space as your live repository. By contrast, incremental backups tend to be quicker to generate and smaller to store. However, the restoration process can be significantly more difficult, often involving applying multiple incremental backups.

How often should backups be taken?

Although, the answer to this question depends on your specific case, repository structure and data, generally the best approach to repository backups is a diversified one that leverages combinations of the methods described above. More specifically, assuming that your repository has some other redundancy mechanism in place with relatively fine granularity (such as per-commit emails or incremental backups/dumps), a hot backup of the database might be something that the repository administrator in your organization would want to include as part of a system-wide nightly backup.