Database corruption, versioned files unaffected

If only your database has been corrupted, (that is, your db.* files were on a drive that crashed, but you were using symbolic links to store your versioned files on a separate physical drive), you only need to re-create your database.

You will need:

  • The last checkpoint file, which should be available from one of these locations:
    • the latest backup of your P4ROOT directory
    • the location indicated by the journalPrefix configurable
    • the journal prefix you supply when running the p4d -jc command. See Checkpoint and journal options.

      If, when you backed up the checkpoint, you also backed up its corresponding .md5 file, you can confirm that the checkpoint was restored correctly by comparing its checksum with the contents of the restored .md5 file.
  • The current journal file, which should be on a separate filesystem from your P4ROOT directory, and which is not affected by any damage to the file system where your P4ROOT directory was held.

You will not need:

  • Your backup of your versioned files. If they were not affected by the crash, they are already up to date.

To recover the database

  1. Stop the current p4d server:

    p4 admin stop

    (You must be a Helix Core Server superuser to use p4 admin.)

  2. Rename (or move) the database (db.*) files:

    mv your_root_dir /db.* /tmp

    There can be no db.* files in the P4ROOT directory when you start recovery from a checkpoint. Although the old db.* files are never used during recovery, it’s good practice not to delete them until you’re certain your restoration was successful.

  3. Verify the integrity of your checkpoint using a command like the following:

    p4d -jv my_checkpoint_file

    The command tests the following:

    • Can the checkpoint be read from start to finish?
    • If it’s zipped, can it be successfully unzipped?
    • If it has an MD5 file with its MD5, does it match?
    • Does it have the expected header and trailer?

      Use the -z flag with the -jv flag to verify the integrity of compressed checkpoints.

  4. Invoke p4d with the -jr (journal-restore) flag, specifying your most recent checkpoint and current journal. If you explicitly specify the server root (P4ROOT), the -r $P4ROOT argument must precede the -jr flag. Also, because the p4d process changes its working directory to the server root upon startup, any relative paths for the checkpoint_file and journal_file must be specified relative to the P4ROOT directory:

    p4d -r $P4ROOT -jr checkpoint_file journal_file

    This recovers the database as it existed when the last checkpoint was taken, and then applies the changes recorded in the journal file since the checkpoint was taken.

Note

Version 2018.1

Starting with Version 2018.1, you no longer need to specify the -z option when restoring compressed journals and checkpoints. This is especially useful when restoring a compressed checkpoint and multiple journals in the same operation. For example:

p4d -r . -jr checkpoint.42.gz journal.42 journal.43 journal

Prior to version 2018.1

If you’re using the -z (compress) option to compress your checkpoints upon creation, you’ll have to restore the uncompressed journal file separately from the compressed checkpoint.

That is, instead of using:

p4d -r $P4ROOT -jr checkpoint_file journal_file

you’ll use two commands:

p4d -r $P4ROOT -z -jr checkpoint_file.gzp4d -r $P4ROOT -jr journal_file

You must explicitly specify the .gz extension yourself when using the -z flag, and ensure that the -r $P4ROOT argument precedes the -jr flag.

Check your system

Your restoration is complete. See Ensure system integrity after a restoration to make sure your restoration was successful.

Your system state

The database recovered from your most recent checkpoint, after you’ve applied the accumulated changes stored in the current journal file, is up to date as of the time of failure.

After recovery, both your database and your versioned files should reflect all changes made up to the time of the crash, and no data should have been lost. If restoration was successful, the lastCheckpointAction counter will indicate "checkpoint completed".