Jira Data Center cache replication
The cache replication in Jira Data Center 7.9 and later is asynchronous, which means that cache modifications aren’t replicated to other nodes immediately after they occur, but are instead added to local queues. They are then replicated in the background based on their order in the queue.
This approach helps us to:
- Improve the scalability of the cluster
- Reduce the amount of cache inconsistencies between the nodes
- Separate the replication from any cache modifications, simplifying and speeding up the whole process
Replicating cache modifications
Local queues
Before we can queue cache modifications, we need to create local queues on each of your nodes. There are separate queues created for each node, so that modifications are properly grouped and ordered.
The queues are created automatically. Whenever you add a node to the cluster, the remaining nodes detect it and create a new queue on their file system. The nodes retrieve information about the whole cluster from the database by using the following query:
select node_id, node_state from clusternode
The output of this query looks similar to this (you'll see more information, like port number or IP address):
node_id | node_state |
---|---|
Node 1 | ACTIVE |
Node 2 | ACTIVE |
Node 3 | ACTIVE |
Node 4 | OFFLINE |
Each node creates 10 separate queues on its file system for each node that is in the active
state. We've chosen 10 queues to increase the throughput—we can't speed up the replication of a single modification, because there are a number of actions that must be completed, but we can replicate 10 modifications at the same time.
In a 3-node cluster with Node 1, Node 2, and Node 3, the following queues will be created:
On Node 1:
- 10 queues for replicating modifications to Node 2
- 10 queues for replicating modifications to Node 3
On Node 2:
- 10 queues for replicating modifications to Node 1
- 10 queues for replicating modifications to Node 3
On Node 3:
- 10 queues for replicating modifications to Node 1
- 10 queues for replicating modifications to Node 2
How does this look on the file system?
The queues are created in the <local-home-directory>/localq
directory on each node. The contents of this directory look similar to this:
queue_Node2_0_f5f366263dcc357e2720042f33286f8f
..
queue_Node2_9_f5f366263dcc357e2720042f33286f8f
queue_Node3_0_bbb747b1516b5225f1ec1c65887b39fc
..
queue_Node3_9_bbb747b1516b5225f1ec1c65887b39fc
Once the queues are created, they're ready to queue cache modifications. Whenever you add a node to the cluster, another 10 queues for this node will be created on each existing node. The new node will create queues for all existing active nodes.
Adding cache modifications to local queues
After all the queues are in place, each caching event that occurs on a specific node can be added to the right queue.
Let’s use Node 1 as an example.
On Node 1:
- A caching event occurs (for example, you make changes to the permission scheme).
- The changes are made in the database, and the local cache related to this change is removed.
- A request to remove the cache is added to the following local queues, from which it will be replicated to the right nodes.
- queue_Node2 for Node 2
- queue_Node3 for Node 3.
Modifications occurring in a single thread will always be added to the same queue. This is to keep the order in which they'll be replicated in case the two events within a thread are dependant on each other.
Replicating cache modifications from local queues to other nodes
After the cache modifications are added to local queues, they're being handled by another thread. Each queue has a single thread responsible for:
- Reading a cache modification request from the queue.
- Delivering the cache modification request to the destination node over RMI.
- Removing the cache modification request from the queue.
To use the same example where the cache modification was added to queue_Node2 and queue_Node3 on Node 1, the next steps for this modification are the following:
- A thread responsible for handling queued modifications reads the modification from local queue queue_Node2, and tries to deliver it over RMI to Node 2.
- Another thread responsible for handling queued modifications reads the modification from local queue queue_Node3, and tries to deliver it over RMI to Node 3.
If modifications are successfully replicated, they're removed from the queue. If the replication failed, errors will be written in the logs.
Monitoring cache replication
You can monitor cache replication by reviewing statistics that are written in the log file. They’ll show you the size of the local queues, and whether cache modifications are successfully replicated or persisted in the queues for too long. In most cases, monitoring just a few parameters will tell you if the replication is working properly.
For more info, see Monitoring the cache replication.
Configuring cache replication
You can configure some options of cache replication, such as the maximum number of modifications in the queue or the frequency of saving the statistics.
Note that these are system properties.
More details
Expand the sections below to understand the cache replication in more detail.