Parallel checkpointing, dumping, and recovery
The p
suboption to -jc
, -jd
and -jr
allows the use of parallel threads for writing to, and reading from, multiple checkpoint files, one per database table.
For example, to specify 4
parallel threads, use the -N numberOfThreads
option:
$ p4d -r . -N 4 -jcp cp Checkpointing files to cp.ckp.9154...
$ p4d -r . -N 4 -jrp cp.ckp.9154 Recovering from cp.ckp.9154...
Although the number of parallel threads is typically controlled by the db.checkpoint.threads
configurable, the two examples above with -N numberOfThreads
show that the p4d command line can override that value.
The m
multifile option is available for:
-
parallel Checkpoint and journal options. For example,
-jcpm [-N numberOfThreads] [-Z | -z] [prefix]
-
parallel Journal dump and restore filtering. For example,
-jdpm [-N numberOfThreads] [-Z | -z] [prefix]
where
-
the if
[-N numberOfThreads]
is omitted, thedb.checkpoint.threads
configurable determines the number of threads to use for checkpoint -
-Z
compresses the checkpoint,-z
compresses both the checkpoint and journal, and if neither-Z
nor-z
is included, no compression occurs. See Helix Core Server (p4d) Reference.
When the directory argument -jcp
, -jdp
, or -jrp
, is specified as a relative path (does not start with a /
or \
), the directory is relative to the server root, P4ROOT
.
File naming convention within a recovery directory if using parallel option
After a successful checkpoint or dump request, the file for a specific table db.xxx
is db.xxx.ckp
or db.xxx.ckp.gz
will be created in this directory.
For example:
db.archive.ckp.gz db.archive.ckp.gz.md5 db.archmap.ckp.gz db.archmap.ckp.gz.md5 ...
or, for a non-compressed checkpoint or dump:
db.archive.ckp db.archive.ckp.md5 db.archmap.ckp db.archmap.ckp.md5 ...
The files with the .md5
suffix contain the MD5 sum of their matching replay file.
When the multifile suboption (m
) is specified, the files for a specific table db.xxx
is db.xxx_bbbbbbbb.ckp
where each has a distinguishing batch number (b
).
The batch number always consists of 8 lower case hexadecimal digits.
For example:
... db.config.ckp.gz db.config.ckp.gz.md5 ... db.revcx_00000001.ckp.gz db.revcx_00000001.ckp.gz.md5 db.revcx_00000002.ckp.gz db.revcx_00000002.ckp.gz.md5 db.revcx_00000003.ckp.gz db.revcx_00000003.ckp.gz.md5 db.revcx_00000004.ckp.gz db.revcx_00000004.ckp.gz.md5 db.revcx_00000005.ckp.gz db.revcx_00000005.ckp.gz.md5 db.revcx_00000006.ckp.gz db.revcx_00000006.ckp.gz.md5 db.revcx_00000007.ckp.gz db.revcx_00000007.ckp.gz.md5 db.revcx_00000008.ckp.gz db.revcx_00000008.ckp.gz.md5 db.revcx_00000009.ckp.gz db.revcx_00000009.ckp.gz.md5 db.revcx_0000000a.ckp.gz db.revcx_0000000a.ckp.gz.md5 ... db.locks_00000001.ckp.gz db.locks_00000001.ckp.gz.md5 db.locks_00000002.ckp.gz db.locks_00000002.ckp.gz.md5 db.locks_00000003.ckp.gz db.locks_00000003.ckp.gz.md5 ...
The m option might improve performance
When the p
suboption is specified for a dump or checkpoint operation, each table is dumped into its own file in the checkpoint directory. At some sites, a few of the tables, such as db.have
and db.integed
, might be so large that the checkpoint operation needs more time for them than for tables of an average size. The m
suboption causes any large table to be split into multiple output files in the checkpoint directory. With multi-threading, these output files are processed in parallel.
The size of checkpoint data might be smaller with multi-file checkpoints than with single-file checkpoints. This might occur if parallel checkpoints result in better compression ratios.
How to know the parallel checkpoint has completed
You can determine when a checkpoint has completed by using either of these two methods:
-
We recommend polling the checkpoint history (
ckphist
) record because this method works whether the checkpoint failed or succeeded. This method was introduced in Helix Core Server version 2023.1.1 -
Polling for the existence of the
md5
file is still supported, but it does not indicate when a checkpoint has failed
Poll for the ckphist record
The syntax to poll the ckphist
record is p4d -xj --jnum ckpnum
where ckpnum is the checkpoint number used to name the checkpoint file and directory.
The checkpoint command reports:
$ p4d -jcmp
Checkpointing files to checkpoint.4...
Example shell script:
#!/usr/bin/bash P4D=$HOME/bin/p4d # Start the multifile checkpoint request $P4D -r . -jcmp -N 4 > out & sleep 1 # Read the output from p4d out=`cat out` # Extract the checkpoint number from the p4d output ckpnum=`expr "$out" : '.*checkpoint\.\([0-9]*\).*'` # Search for the ckphist record for that checkpoint number rec=`$P4D -r . -xj --jfield="startDate,endDate,jfile,failed" --jnum "$ckpnum"` # See what we've got echo "Found " $rec
Poll for the md5 file
In the same directory that contains the checkpoint or dump directory, look for the consolidated .md5
file. This file is created when the operation has completed, whether the operation was successful or not. In the following example, this file is named checkpoint.3.md5
$ p4d -r . -jcmp -z Checkpointing files to checkpoint.3... Rotating journal to journal.2.gz... $ ls checkpoint.* drwxrwxr-x 2 perforce perforce 12288 Jun 29 11:11 checkpoint.3 -r--r--r-- 1 perforce perforce 9784 Jun 29 11:11 checkpoint.3.md5 $
The file consists of one line for each checkpoint file created in the checkpoint directory, with each line including the checkpoint file name, MD5 digest, and the epoch time stamp. For example:
MD5 (checkpoint.3/db.config.ckp.gz) = 5A32E66EE638A52F480F476B0B78191E 1688033506 MD5 (checkpoint.3/db.configh.ckp.gz) = B26E2EBA2E35B5F138792549A585276D 1688033506 MD5 (checkpoint.3/db.counters.ckp.gz) = D9A5E3CE0728B6206E4A746CB6854994 1688033506 MD5 (checkpoint.3/db.nameval_00000001.ckp.gz) = 035080F2CDFDB5BE9FC5E9D640CF5ABA 1688033506 MD5 (checkpoint.3/db.nameval_00000004.ckp.gz) = 7D052BA5C906C7C3087FE49DB4FCD48D 1688033506 ...
The ordering of the records is significant. Checkpoint files that were completed first by a parallel thread are at the top of the file.
Prefix for parallel checkpoints
Checkpoint files are placed into a newly-created directory based on the prefix for the checkpoint or dump. Specifying a prefix on the p4d -jcp
command overrides the prefix set by the journalPrefix
configurable.
Configurables for tuning checkpoints
The values and purpose of the configurables that you can use to tune checkpoints:
Configurable |
Default |
Min |
Max |
Use |
---|---|---|---|---|
db.checkpoint.reqlevel
|
4
|
2
|
20
|
Only database files at this level or deeper in the btree are considered to be split into multiple checkpoint files during a checkpoint or dump request. |
db.checkpoint.worklevel
|
3
|
2
|
20
|
The page level examined within a database table that supplies the record keys used to split that table during a multifile checkpoint operation. |
db.checkpoint.numfiles
|
10
|
1
|
20000
|
Used to determine how many checkpoint files should be generated during a multifile checkpoint operation. This value can be overridden by the --numfiles option of p4 dbstat command. |
db.checkpoint.threads
|
0
|
0
|
4096
|
Maximum number of threads to use in a checkpoint, dump, or recovery. The value must be 2 or greater for a multifile request to split a table. Many factors might affect performance (CPU, memory, disks, controllers, file system, system load), so no simple way exists to determine the best value. Start with a value such as 4 , 6 , or 8 , then monitor processor and I/O performance to determine whether a larger value is appropriate for your system. |
How the number of checkpoint files is calculated
The value of numfiles is used to determine the number of keys to generate. The total number of pages found at the worklevel is divided by the "effective numfile" (en
) value. The "effective numfile" value is calculated by this formula:
en = n (a - w)
where
n
is the value of db.checkpoint.numfiles
a
is the depth of the current database file, and
w
is the value of db.checkpoint.worklevel
For example, if a
= 5, w
= 3, and n
= 10,
then 10 ^ ( 5 - 3) which is 10 ^ 2 = 100
so in this case setting db.checkpoint.numfiles
to 10 results in 100 "effective numfiles".