Back up services

How you back up a service in your multi-server deployment depends upon the service type:

   
Broker
  • stores no data locally
  • back up its configuration file manually
Proxy
  • requires no backups and automatically rebuilds its cache of data if files are missing
  • contains no logic to detect when disk space is running low. Periodically monitor your proxy to ensure it has sufficient disk space.
Server
  • Follow the backup procedures described at Backup and recovery.
    • If you are using an edge-commit architecture, both the Commit Server and the Edge Servers must be backed up. See Backup and recovery planning.
  • Backup requirements for replicas that are not Edge Servers vary depending on your site’s requirements.
  • Consider taking checkpoints offline so that your users are not blocked from accessing the primary server during lengthy checkpoint operations. See Taking checkpoints on Edge and Replica Servers.

  • Although a checkpoint (p4d -jc) is NOT supported on an edge or replica server, you CAN take a checkpoint dump on an edge or replica server (p4d -jd). See the Helix Core Server (p4d) Reference.

  • Maintaining journals:
    • on Edge Servers is a best practice
    • on replica servers is optional, and you can disable such journals by using p4d -J off
  • You can have triggers fire when the journal is rotated on an edge or replica server. See Triggering on journal rotation.
  • Journal rotation on a replica or Edge Server begins AFTER the master has completed its journal rotation

Taking checkpoints on Edge and Replica Servers

First, run p4 admin checkpoint against the edge or replica:

p4 -p edge:1666 admin checkpoint -Z

The background journal pull command will perform the checkpoint at the next rotation of the journal on the master.

This results in a message about the scheduling of the checkpoint and a file called stateCKP being written to the edge or replica server root (P4ROOT) directory containing information about the scheduling of the checkpoint. For example:

Checkpoint scheduled at 1472141783 (2020/03/26 09:16:23 -0700 PDT ); opts:

To cancel a scheduled checkpoint, remove the stateCKP file from the edge or replica P4ROOT prior to rotating the journal on the commit or master server.

Second, run p4 admin journal against the commit or master:

p4 -p commit:1666 admin journal
Rotating journal to journal.40...

Note

Do not use the -z flag to p4 admin journal or p4d -jj

This is because rotated commit and master server journals initially need to be uncompressed. Otherwise replication could be adversely affected.

Detecting coordinated checkpoint completion

To determine that a coordinated checkpoint has completed, record the journal counter on the commit or master at the time the edge or replica checkpoint is scheduled. For example, the following counter command, run against your commit or master server, reports the current value of the journal counter on that server. The admin checkpoint command, run on the edge or replica, schedules a checkpoint on that server the next time a journal rotation is detected on the master.

            p4 -p commit:1666 counter journal
40
p4 -p edge:1666 admin checkpoint -Z
The 'pull' command will perform the checkpoint at the next rotation 
of the journal on the master.

In the example above, the journal counter is reported as 40, which means that the next checkpoint will be 41. To find out whether the checkpoint has completed, use one of the following.

Checkpoint Checksum

When a checkpoint completes, an md5 checksum of the checkpoint contents is written alongside the checkpoint:

$ ls -l edge1/checkpoint.41*
-r--r--r-- 1 bruno staff 11833462 Aug 25 09:59 checkpoint.41
-r--r--r-- 1 bruno staff 55 Aug 25 09:59 checkpoint.41.md5

Look for the writing of the md5 checksum, which means the checkpoint has completed.

A journal-rotate trigger on the edge or replica

Configure a journal-rotate trigger on the edge or replica. This fires when the edge or replica journal is rotated. Since journal rotation is a sign of a successful checkpoint, if the trigger fires you know the checkpoint has completed. See Triggering on journal rotation.

Checkpoint History

The p4 journals command displays information from the db.ckphist table which holds historical information about checkpoint and journal activity. For example, you can report on the last checkpoint taken using:

                            p4 journals -F type=checkpoint -m1
... start 1472142210
... startDate 2018/08/25 09:23:30
... end 1472142211
... endDate 2018/08/25 09:23:31
... pid 53536
... type checkpoint
... flags -q true (admin checkpoint)
... jnum 40
... jfile checkpoint.40
... jdate 1472142211
... jdateDate 2018/08/25 09:23:31
... jdigest 7A5080F52EC13518305AD2A93919864A
... jsize 11833462
​... jtype text

Once a checkpoint has been scheduled and you know the checkpoint sequence number of the next edge or replica checkpoint, poll the edge or replica using p4 journals for the next checkpoint:

p4 journals -F 'type=checkpoint jnum=41'

The command returns without providing any output until the checkpoint has completed, at which time you'll see the details of the checkpoint completion in the p4 journals output:

                            p4 journals -F 'type=checkpoint jnum=41'
... start 1472144358
... startDate 2018/08/25 09:59:18
... end 1472144358
... endDate 2018/08/25 09:59:18
... pid 53757
... type checkpoint
... flags -q true (admin checkpoint)
... jnum 41
... jfile checkpoint.41
... jdate 1472144358
... jdateDate 2018/08/25 09:59:18
... jdigest 22971CDC1E26C70B1E6A58C92C4820AA
... jsize 11833460
... jtype text
Note

If a checkpoint fails, the p4 journals output contains information about the failure, including the error message related to the failure. For example:

                                ​p4 journals -m1
... start 1452184543
... startDate 2018/01/07 08:35:43
... end 1452184543
... endDate 2018/01/07 08:35:43
... pid 98622
... type checkpoint
... flags  (admin checkpoint)
... jnum 41
... jfile /Volumes/backups/checkpoint.41
... jdate 1452184543
... jdateDate 2018/01/07 08:35:43
... jdigest CFF44FD4B9B26AD90F93AC71D4E47418
... jsize 65536
... jtype text
... failed 1
... errmsg write: /Volumes/backups/checkpoint.41: No space left on device