Perforce IPLM High Availability with HAProxy and Neo4j cluster: Manage and configure Neo4j cluster

Previous step: Perforce IPLM High Availability with HAProxy and Neo4j cluster: Installation

Manage Neo4j cluster

All the nodes in Neo4j cluster need to be configured properly first, and then can be started one by one. A node opens services port (port 7474 by default) and accepts service request only after it has successfully connected and negotiated with other nodes in the cluster.

Configure Neo4j cluster

Neo4j configuration file location

  • For node installed with neo4j-enterprise software package (rpm or debian packages), the config file path is 

    • /etc/neo4j/neo4j.conf

  • For node installed by using PiServer Self-Extracting archive, the config file path is 

    • <neo4j-enterprise-3.5.3>/conf/neo4j.conf

Enable Neo4j HA

To enable Neo4j cluster running in HA mode, configure the following options in Neo4j config file for each node.

  • dbms.connector.http.listen_address=0.0.0.0:7474
  • dbms.mode=HA
  • dbms.security.ha_status_auth_enabled=false
  • ha.server_id=<unique numeric id in the cluster. It is 1, 2 or 3 in a 3-node cluster>
    • Ex. ha.server_id=1
  • ha.initial_hosts=<comma-separated list of host:port for each node in the cluster. Port is 5001 usually>
    • Ex. ha.initial_hosts=centos7-ha1:5001,centos7-ha2:5001,centos7-ha3:5001
  • ha.host.coordination=<host:port in the above list for this node>
    • Ex. ha.host.coordination=centos7-ha1:5001
  • ha.host.data=<host:port for this node. Port is 6001 usually>
    • Ex. ha.host.data=centos7-ha1:6001
  • dbms.logs.http.enabled=true (uncomment this to enable Neo4j-side http logs, highly recommended).

HA clustering tunables

These two tunables enable optimistic push as a replication mechanism. This can be applied to 10+ slaves without ill-effect.

  • ha.pull_interval=500ms (or even lower)
  • ha.tx_push_factor=2 (or number of nodes in cluster minus one)

Apply this if your dataset takes more than 2 minutes to replicate from scratch:

  • ha.role_switch_timeout=3600s

Additional tunables

See Neo4j configuration to review additional tunables that apply to your ecosystem, especially the following entries related to capacity and resiliency:

  • mdx.pagination_cli_page_size

  • mdx.pagination_cli_concurrent_requests

  • mdx.transaction_max_retries_entity_not_found

  • mdx.transaction_max_retries_other

  • mdx.transaction_retries_delay

Enable online backup

To properly enable online backup for Neo4j HA cluster, the following options need to be configured in Neo4j config file on the node that serves online backup.

  • dbms.tx_log.rotation.size=1M
  • dbms.tx_log.rotation.retention_policy=3 files
  • dbms.backup.enabled=true
  • dbms.backup.address=0.0.0.0:6362

Migrate old Neo4j DB to Neo4j cluster

If there is an instance of IPLM Server/Neo4j running already and the data need to be transferred to the Neo4j cluster, follow the steps below to copy the data to the Neo4j cluster.

  1. On the existing PiServer/Neo4j instance, load the PiServer licenses for the nodes in Neo4j cluster.

  2. Stop the existing PiServer/Neo4j instance.

  3. Transfer the folder /var/lib/mdx-neo4j/data/databases/graph.db or <piserver install dir>/neo4j-enterprise-3.5.35/data/databases/graph.db to each node in Neo4j cluster.

  4. Then, on each node in Neo4j cluster, run this command:

mkdir -p /var/lib/neo4j/data/databases/
mv <graph.db folder that's copied over> /var/lib/neo4j/data/databases/
chown -R neo4j:neo4j /var/lib/neo4j/data/databases

Note:  Make sure the  /var/lib/neo4j/data/dbms  folder remains unchanged on each node in the Neo4j cluster.

(Optional) Run Neo4j consistency checker

Optionally, after transferring the old Neo4j DB to the new Neo4j node, Neo4j consistency checker can be run to further make sure the database is consistent.

[root@centos7-ha1 ~]# neo4j-admin check-consistency --database=graph.db
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
2020-06-27 06:09:47.740+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /var/lib/neo4j/data/databases/graph.db
2020-06-27 06:09:47.742+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
.................... 10%
.................... 20%
.................... 30%
.................... 40%
.................... 50%
.................... 60%
.................... 70%
.................... 80%
.................... 90%
.................Checking node and relationship counts
.................... 10%
.................... 20%
.................... 30%
.................... 40%
.................... 50%
.................... 60%
.................... 70%
.................... 80%
.................... 90%
.................... 100%
[root@centos7-ha1 ~]#

Start Neo4j Cluster

Start the Neo4j server on each node in the cluster.

  • For node installed with Neo4j-enterprise software package (rpm or debian packages), use following command to start Neo4j service for the node,

    • service neo4j start

  • For node installed by using IPLM Server Self-Extracting archive, use following command to start Neo4j service for the node, 

    • cd <neo4j-enterprise-3.5.3>

    • ./bin/neo4j start

Note:

  • All the nodes need to be started and properly negotiated before any of the nodes in the cluster can start serving requests.
  • For large Neo4j DB, it may take a while for all the nodes to get synchronized. It's been observed that it may take up to a minute for a cluster with ~6.5 GB graph DB to get synchronized in a typical LAN environment.
  • In the above section of Migrate old Neo4j DB to Neo4j cluster, it'll also work if the old graph DB is only transferred to one of the Neo4j cluster nodes. But it'll take more time for all the nodes to get synchronized when Neo4j service starts. In a typical LAN environment, it may take around 5 minutes for a ~6.5GB graph DB to be synchronized.

To stop or check the status of the Neo4j server, replace start with stop or status in the above commands.

Neo4j logs

Neo4j daemon logs are in different locations, depending on how the Neo4j Enterprise software is installed.

  • On RHEL6/CentOS6, the Neo4j logs are under /var/log/neo4j/.

  • On RHEL7/CentOS7/Debian, the basic Neo4j logs can be viewed by running journalctl -u neo4j. More logs can be found under /var/log/neo4j/.

  • If the Neo4j Enterprise software is installed with IPLM Server Self-Extracting archive, the logs can be found in folder defined by dbms.directories.logs in the Neo4j configuration file.

Next step: Perforce IPLM High Availability with HAProxy and Neo4j cluster: Deploy IPLM Server