System resources

To help maintain consistent system performance, configure Helix Core Server to monitor system resources, balance loads, and prevent large spikes in resource usage. The server can reduce the amount of work the server accepts when resources become limited. The server can:

  • Pause incoming commands when medium thresholds are reached.

  • Terminate incoming commands when high thresholds are reached.

Resource monitoring is generally more useful on Linux than on Windows. Servers with many concurrently running commands benefit the most.

Configure your server monitoring so that the thresholds exceed the normal load. Setting the thresholds too low can cause unnecessary slowdowns by failing to use available system resources.

Set the medium and high thresholds to be a significantly different values.

To help avoid the risk of high memory levels that cause the server to start terminating processes, use the p4 group MaxMemory field. To learn more, see MaxLimit fields in Helix Core Command-Line (P4) Reference.

For additional information and advice about how to set the limits, see the output of the p4 help server-resources command.

Configurables

To customize resource monitoring, you can set the following configurables.

Pausing

When any resource exceeds its threshold, the server pauses incoming commands until resource usage has dropped below the threshold.

Configurable

Meaning

sys.pressure.max.pause.time

 

Maximum number of seconds a command can be in the paused state before the server returns an error to the client. Setting this configurable to 0 disables pausing commands.

To preview the effect of the configured thresholds, set sys.pressure.max.pause.time to 0. This enables the p4 admin resource-monitor background task to sample resources and set pressure levels without pausing commands.

sys.pressure.max.paused

 

Maximum number of client commands that are concurrently paused on the server. New incoming commands that exceed this threshold are rejected with an error. Some administrative and replication commands are not subject to pausing.

A Helix Core Server extension can register a "pressure-pause" hook to be called periodically while a command is paused. Such an extension can continue the pause, resume execution, or defer the choice to the server. Such an extension can implement a throttling policy that is specific to a site. To learn more, see the Helix Core Extensions Documentation.

A paused command is visible with p4 monitor show. The time spent in the paused state is recorded in the tracking entries of the server log files. See paused state in P4LOG in the Helix Core Command-Line (P4) Reference.

Percentage-based memory

Use these settings can help to ensure sufficient memory for the file system cache of the operating system. Percentage-based calculation is most applicable for large-scale sites with abundant physical memory resources. In contrast, smaller sites with more limited memory are typically operating closer to their capacity, so routine swapping might be normal.

Percentage-based memory can be set within the range of 0 to 100. For example, you might limit work when 75% of total system memory is used.

Only use percentage-based memory on Windows where swap is not normally used.

Configurable

Meaning

sys.pressure.mem.medium When commands begin pausing. For example, if sys.pressure.mem.medium is set to 60, restrictions are applied when the usage of physical memory resources calculated during the last duration exceeds 60%.
sys.pressure.mem.high The operating system is about to resort to swapping memory and the Out Of Memory Killer might terminate processes. Existing commands that request more memory while the server is above this threshold might be canceled and return an error to the client.

OS-supplied resource pressure thresholds

If your Helix Core Server runs on Windows, sys.pressure.os.mem.high is the sole resource pressure threshold supplied by the operating system.

If your Helix Core Server runs on Linux, you can benefit from all three of the OS-supplied resource pressure thresholds. To learn more, see Verifying OS-supplied resource pressure thresholds on Linux.

The following configurables can be set between 0 and 100, and represent the percentage of processes on the system stalled for the resource.

Configurable

Meaning

sys.pressure.os.mem.medium (Linux) The server tries to keep memory usage below this level. If only sys.pressure.os.mem settings are set, and sys.pressure.os.mem.medium is set to 60 (percent), even if the system uses more than 60% of memory and swap resources, restrictions will not be applied unless the percentage of processes stalled due to memory reaches the 60% threshold.
sys.pressure.os.mem.high (Windows and Linux) At this threshold, the server rejects new incoming commands. Existing commands that request more memory might be canceled and return an error to the client. When Helix Core Server limits its work, it does not distinguish between memory used by Helix Core Server and memory used by other processes. Therefore, if a large external process begins to consume a large amount of memory, Helix Core Server can throttle itself.
sys.pressure.os.cpu.high (Linux) Some processes on the system are waiting for CPU time. For Linux cgroup support, only the system-wide /proc/pressure/* files are considered.

Duration

The duration specifies the number of milliseconds for averaging samples of resource pressure. A longer duration makes the server less sensitive to changes in pressure.

Configurable

Meaning

sys.pressure.mem.high.duration Number of milliseconds for averaging sys.pressure.mem.high
sys.pressure.mem.medium.duration Number of milliseconds for averaging sys.pressure.mem.medium
sys.pressure.os.cpu.high.duration Number of milliseconds for averaging sys.pressure.os.cpu.high
sys.pressure.os.mem.high.duration Number of milliseconds for averaging sys.pressure.os.mem.high
sys.pressure.os.mem.medium.duration Number of milliseconds for averaging sys.pressure.os.mem.medium

A high threshold must be set higher than the corresponding medium threshold. Otherwise, the default values are used.

Prerequisites

The server can respond to resource pressure if:

  • The operating system supports pressure thresholds. Windows and Linux are different. See OS-supplied resource pressure thresholds.

  • Real-time monitoring counters are enabled.

  • The p4 admin resource-monitor server startup command is running. See the command-line output of the p4 help admin-resource-monitor commands.

  • Existing resource usage baselines are established.

  • The thresholds that cause pausing are high enough to avoid constant pausing.

An example of setting up resource monitoring

p4 serverid $name
p4 configure set rt.monitorfile=$monitor_file
p4 configure set "$name#startup.1=admin resource-monitor"
p4 admin restart

Exceptions to pausing

The following commands are not subject to being paused under pressure:

admin configure counter counters dbstat dbverify depots diskspace export extension failback failover heartbeat info journalcopy journaldbchecksums journals lockstat logappend login login2 logout logparse logrotate logschema logstat logtail monitor passwd ping protect pull serverid servers topology triggers user

Preview and activate monitoring

You can generate a preview of system resource monitoring. Then, adjust the settings to meet your requirements before you activate monitoring in the production environment.

Complete the following steps:

  1. Set a server ID name:
    p4 serverid $name

  2. Enable real-time monitoring:
    p4 configure set rt.monitorfile=$monitor_file

  3. Enable the resource monitoring background process:
    p4 configure set "$name#startup.1=admin resource-monitor"

  4. Enable preview mode, which prevents pausing:
    p4 configure set sys.pressure.max.pause.time=0

  5. Restart the server:
    p4 admin restart

  6. To preview the pauses, check the "Server under resource pressure. Pause rate" message in the log entries of the p4 admin resource-monitor background task.

  7. Adjust the configurables, if necessary, until you are satisfied with the preview results.

  8. Activate monitoring:
    p4 configure unset sys.pressure.max.pause.time

Disable monitoring

Each of the following disables monitoring:

  • Turning off the p4 admin resource-monitor startup command, either with p4 monitor terminate, or by removing the startup configurable and restarting the server.

  • Setting 0 to be the value for the sys.pressure.max.pause.time configurable or the sys.pressure.os.*.high configurables.

Example of resource pressure actions

When resource pressure thresholds are reached or exceeded, you might see results similar to these examples.

Server log:

2024/06/19 12:25:31 560465376 pid 1056102: Server is now using 55 active threads.
2024/06/19 12:25:31 560486548 pid 1056102: Server now has 10 paused threads.

where paused threads are due to resource monitoring being active.

For any paused command, if the track configurable is set to 1, you might see:

Copy
Perforce server info:
2024/06/19 12:25:31 pid 1056864 perforce@ip-10-0-0-106 127.0.0.1 [p4/2024.1/LINUX26X86_64/2611120] 'user-fstat -Ob //...'
--- lapse 8.39s
--- paused 1.20s
--- usage 598+67us 304+0io 0+0net 68864k 0pf
--- memory cmd/proc 74mb/74mb
...

where line 4 indicates that the command was paused for 1.20 seconds.

For a command that was rejected due to resource monitoring thresholds being exceeded, the entry from track output indicates exited on fatal server error on line 11:

Copy
Perforce server error:
Date 2024/06/19 12:25:31:
Pid 1056860
Operation: user-fstat
Operation 'user-fstat' failed.
Too many commands paused;  terminated.
Perforce server info:
2024/06/19 12:25:31 pid 1056860 completed .008s 0+0us 0+8io 0+0net 12828k 0pf
Perforce server info:
2024/06/19 12:25:31 pid 1056860 perforce@ip-10-0-0-106 127.0.0.1 [p4/2024.1/LINUX26X86_64/2611120] 'user-fstat -Ob //...'
--- exited on fatal server error
--- lapse .008s
--- memory cmd/proc 21mb/21mb
...

and the error entry would also be in the errors.csv structured log.

Verifying OS-supplied resource pressure thresholds on Linux

For Linux systems, enable cgroups v2 and Pressure Stall Information (PSI).

To verify that cgroup-v2 was mounted:

mount -l | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)

To verify that Pressure Stall Information (PSI) is working:

ls /proc/pressure
cpu io irq memory

For examples of how to enable cgroups v2 and PSI for common Linux distributions, see the Perforce Knowledge Base article, Enabling OS-supplied resource pressure thresholds on Linux.