Cluster panic is triggered in Confluence Data Center when a node rejoins the cluster
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Problem
Cluster panic is triggered in Confluence Data Center when a node rejoins the cluster. There are no logs written to atlassian-confluence.log except a warning that Hazelcast is terminating forcefully:
When making the following actions:
- All cluster nodes are in a cluster gracefully (e.g. Nodes 1, 2, and 3)
- One node is taken out of the cluster by being shut down (e.g. node 3)
- This leaves nodes 1 and 2 in the cluster. Once node 3 starts again and joins the cluster, nodes 1 and 2 go into panic mode
- Hazelcast terminates forcefully in some/all nodes
Similar logging as the following appears in the atlassian-confluence.log:
2021-01-27 22:08:43,894 WARN [hz.ShutdownThread] [com.hazelcast.instance.Node] log [xxx.xxx.xxx.xxx]:5801 [confluenceCluster] [3.8.6] Terminating forcefully...
Cause
The nodes are having some issues communicating over multicast consistently. Network communication tools such as Omping show no communication errors while the nodes are in a cluster, but communication is broken down when Hazelcast terminates in one of the nodes.
Workaround
The workaround for Confluence Data Center versions 5.9 and above is to move from using multicast to unicast.
If you're setting up Confluence Data Center for the first time, it'll step you through the process of choosing your discovery mode and adding cluster nodes. If you decide to change the node discovery for the cluster, you'll need to edit the
confluence.cfg.xml
file in the local home directory of each cluster node.- Before you make any changes, shut down all nodes in your cluster
- Make sure the discovery configuration is exactly the same for each node (make the same changes to the
confluence.cfg.xml
file in each local home directory) - Always perform a safety backup before making manual edits to these files
The changes you need to make may differ slightly, depending on whether you've upgraded from an older version of Confluence Data Center or if you've started with version 5.9. We've detailed both methods, below.
To change from multicast to TCP/IP
Look for the following two lines in the
confluence.cfg.xml
file:<property name="confluence.cluster.address">[multicast IP]</property> <property name="confluence.cluster.join.type">multicast</property>
If both lines exist in the file, change them to the lines below; where the
confluence.cluster.address
property exists, but there's no reference to theconfluence.cluster.join.type
property, update the first line and add the second line as shown below.<property name="confluence.cluster.peers">[node 1 IP],[node 2 IP],[node 3 IP]</property> <!-- A comma-separated list of node IP addresses, without spaces --> <property name="confluence.cluster.join.type">tcp_ip</property> <!-- accepted values are multicast or tcp_ip -->
Enter the address of each node, and separate each address with a comma. Please, make sure to remove the brackets from around the IP addresses.
You can now restart your cluster nodes.
To change from multicast to AWS
Look for the following two lines in the
confluence.cfg.xml
file and remove them:<property name="confluence.cluster.address">[multicast IP]</property> <property name="confluence.cluster.join.type">multicast</property>
Depending on which type of credentials you are passing to Confluence, you will add one of the following two blocks with your AWS configuration.Option 1: For Access Key/Secret Key based credentials:
<property name="confluence.cluster.join.type">aws</property> <property name="confluence.cluster.aws.host.header">[---VALUE---]</property> <property name="confluence.cluster.aws.region">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.key">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.value">[---VALUE---]</property> <property name="confluence.cluster.aws.access.key">[---VALUE---]</property> <property name="confluence.cluster.aws.secret.key">[---VALUE---]</property>
Option 2: For IAM role based credentials:
<property name="confluence.cluster.join.type">aws</property> <property name="confluence.cluster.aws.host.header">[---VALUE---]</property> <property name="confluence.cluster.aws.region">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.key">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.value">[---VALUE---]</property> <property name="confluence.cluster.aws.iam.role">[---VALUE---]</property>
To change from TCP/IP to AWS
Look for the following two lines in the
confluence.cfg.xml
file and remove them:<property name="confluence.cluster.join.type">tcp_ip</property> <property name="confluence.cluster.peers">[node 1 IP],[node 2 IP],[node 3 IP]</property>
Depending on which type of credentials you are passing to Confluence, you will add one of the following two blocks with your AWS configuration.
Option 1: For Access Key/Secret Key based credentials:
<property name="confluence.cluster.join.type">aws</property> <property name="confluence.cluster.aws.host.header">[---VALUE---]</property> <property name="confluence.cluster.aws.region">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.key">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.value">[---VALUE---]</property> <property name="confluence.cluster.aws.access.key">[---VALUE---]</property> <property name="confluence.cluster.aws.secret.key">[---VALUE---]</property>
Option 2: For IAM role based credentials:
<property name="confluence.cluster.join.type">aws</property> <property name="confluence.cluster.aws.host.header">[---VALUE---]</property> <property name="confluence.cluster.aws.region">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.key">[---VALUE---]</property> <property name="confluence.cluster.aws.tag.value">[---VALUE---]</property> <property name="confluence.cluster.aws.iam.role">[---VALUE---]</property>
You can now restart your cluster nodes.
Note that if you're using a CloudFormation YAML template you need to make sure you have these appropriate values as a minimum and they should be reflected on the AWS side as well. If you switch to AWS mode cluster type, please also review Running Confluence Data Center in AWS and make sure you have the following set up in your YAML:
Key: Cluster Value: !Ref AWS::StackName PropagateAtLaunch: true
To change from TCP/IP to multicast
To switch from TCP/IP to multicast, just perform the reverse of the changes outlined above.
Reference of properties in the confluence.cfg.xml file
key valid values notes confluence.cluster.join.type
'multicast'
or'tcp_ip'
or'aws'
Pre-5.9 Data Center installations won't have this key. By default, if the key is missing, Confluence will choose multicast
confluence.cluster.address
a single multicast IP address This key is only used by confluence if confluence.cluster.join
.type
is set tomulticast
confluence.cluster.peers
a comma-separated string of IP addresses (no spaces) There must be at least one address here. The addresses are the IP address of each node in the cluster, for example
<property name="confluence.cluster.peers">[node 1 IP],[node 2 IP],[node 3 IP]</property>
This key is only used by confluence if
confluence.cluster.join
.type
is set totcp_ip
confluence.cluster.authentication.enabled
true, false Set this property to false if you don't want to authenticate Confluence nodes as they join the cluster. This is not recommended. confluence.cluster.authentication.secret
(automatically generated) Set this property to change the shared secret used to authenticate nodes as they join the cluster. The secret must be a string of maximum 40 characters.
Confluence Data Center versions prior to 5.9 do not have the option to use unicast, so the workaround is not applicable. However, a similar issue has been addressed for versions 5.8.5 and above: CONF-39396 - Getting issue details... STATUS