Cluster Cache Replication health check fails in Jira Data Center
Platform Notice: Data Center Only - This article only applies to Atlassian products on the Data Center platform.
Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data-Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Overview
JIRA Data Center cluster replication relies on nodes being recorded in a database and also receiving and sending updates. The Health Check confirms that the replication is working in the entire cluster. If an active node is not responding, the other nodes are going to be reporting warnings and the one with the error will report a critical result.
Understanding the Results
Icon | Result | What this means |
---|---|---|
The health check passed successfully. | Node replication within the cluster is working. | |
The node <node> is not in the database. | The node does not appear to be in the database however does exist within the replication cache, or the node is unresponsive. | |
The node <node> is not replicating. | The node is not replicating information to the cluster - it exists in the database but not in the replication cache. |
Inconsistent state across the cluster
Jira keeps some data in memory local to the node especially data that are used often such as permissions. The cache synchronization is asynchronous (7.9 and later) but expected to be fast and consistent across nodes. It is communicated and replicated over the network.
Symptoms:
- Users exist on some nodes but not all.
- Users may have permissions on some nodes but not all.
- User field dropdown shows results on some nodes but not all.
- Filter and gadgets show up in one node but not on others after permission update.
Troubleshooting
Problem | Suggestion |
---|---|
The node is not in the database. | Restart the affected node. Prior to doing so, it is recommended to collect some thread dumps as per Generating a Thread Dump as these can be sent to support with the data below. |
The node is not replicating due to a network condition. |
A request has been raised to have these configuration options documented. |
The node is not replicating due to the nodes being offline. | Check the status of each of the other nodes, specifically if they are online and responsive. |
You can monitor cache replication by reviewing statistics that are written in the log file. They’ll show you the size of the local queues, and whether cache modifications are successfully replicated or persisted in the queues for too long. In most cases, monitoring just a few parameters will tell you if the replication is working properly. For more info, see Monitoring the cache replication.
Providing Information to Support
In case you are unable to troubleshoot and fix the problem by yourself, please create a support ticket at support.atlassian.com and attach the following information to the ticket:
- Take a Screenshot of the Health Check results.
- Collect a Support ZIP from each of the Data Centre nodes.
- Any collected information from the suggestions in this document.