Upgrade a Confluence cluster through the API without downtime

This document provides guidance on how to initiate and finalize a rolling upgrade through API calls. This upgrade method is suitable for admins with the skills and automation tools to orchestrate maintenance tasks (like upgrades).

For an overview of rolling upgrades (including planning and preparation information), see Upgrade Confluence without downtime

API reference

The entire rolling upgrade process is governed by the following API:

http://<host>:<port>/rest/zdu/cluster/zdu/

This API has the following calls:

Get an overview of the cluster's status.
Enable upgrade mode.
Get the status of the cluster. 
Get an overview of a node's status, including the number of running tasks.
Disable upgrade mode. You can only use this call if the upgrade progress is not MIXED.
Once all nodes are upgraded, finalize the rolling upgrade. This will automatically disable upgrade mode.

For detailed information about each API call, see Confluence REST API Documentation.

Initiating a rolling upgrade

To initiate a rolling upgrade, enable rolling upgrade first. To do this, use:

http://<host>:<port>/rest/zdu/cluster/zdu/start

Upgrade mode allows your cluster to temporarily accept nodes running different Confluence versions. This lets you upgrade a node and let it rejoin the cluster (along with the other non-upgraded nodes). Both upgraded and non-upgraded active nodes work together to keep Confluence available to all users. You can disable upgrade mode as long as you haven’t upgraded any nodes yet.

Upgrading each node individually

Before you upgrade a node, you'll need to gracefully shut down Confluence on it. To do this, run the stop script corresponding to your operating system and configuration. Learn more about graceful Confluence shutdowns.

For example, if you installed Confluence as a service on Linux, run the following command:

$ sudo /etc/init.d/confluence stop

After upgrading Confluence on the node, wait for it to transition to an Active status first before upgrading another node.

Node statuses

To get the status of a node, use:

http://<host>:<port>/rest/zdu/cluster/zdu/nodes/<nodeID>

ACTIVEConfluence is connected to the cluster and running with no errors. 
STARTINGConfluence is still loading, and should transition to Active once finished.
TERMINATINGConfluence was gracefully shut down, and should transition to Offline once finished.
OFFLINEConfluence is not responding on the node. This node will be removed from the cluster completely if it is still offline after Upgrade mode is disabled.
ERRORSomething went wrong with Confluence on the node.

Cluster statuses

To get the status of the cluster, use:

http://<host>:<port>/rest/zdu/cluster/zdu/state

STABLEYou can turn on Upgrade mode now. 
READY_TO_UPGRADE

Upgrade mode is enabled, but no nodes have been upgraded yet. You can start upgrading your first node now. 

MIXEDAt least one node is upgraded, but you haven't finished upgrading all nodes yet. Your cluster has nodes running different Confluence versions. You need to upgrade all nodes to the same bug fix version to transition to the next status (READY_TO_RUN_UPGRADE_TASKS).
READY_TO_RUN_UPGRADE_TASKS

All nodes have node been upgraded. You can now finalize the rolling upgrade:

http://<host>:<port>/rest/zdu/cluster/zdu/approve

Enable and disable Upgrade mode

How you roll back depends on the upgrade stage you have reached. See Roll back a rolling upgrade for more information. 

Mixed status with Upgrade mode disabled

If a node is in an Error state with Upgrade mode disabled, you can't enable Upgrade mode. Fix the problem or remove the node from the cluster to enable Upgrade mode.

Troubleshooting

Node errors during rolling upgrade

If a node’s status transitions to Error, it means something went wrong during the upgrade. You can’t finish the rolling upgrade if any node has an Error status. However, you can still disable Upgrade mode as long as the cluster status is still Ready to upgrade.

There are several ways to address this:

  • Shut down Confluence gracefully on the node. This should disconnect the node from the cluster, allowing the node to transition to an Offline status.

  • If you can’t shut down Confluence gracefully, shut down the node altogether.

Once all active nodes are upgraded with no nodes in Error, you can finalize the rolling upgrade. You can investigate any problems with the problematic node afterwards and re-connect it to the cluster once you address the error.

Disconnecting a node from the cluster through the load balancer

If a node error prevents you from gracefully shutting down Confluence, try disconnecting it from the cluster through the load balancer. The following table provides guidance how to do so for popular load balancers.
NGINX

NGINX defines groups of cluster nodes through the upstream directive . To prevent the load balancer from connecting to a node, delete the node's entry from its corresponding upstream group. Learn more about the upstream directive in the ngx_http_upstream_module module.

HAProxy

With HAProxy, you can disable all traffic to the node by putting it in a maint state:

set server <node IP or hostname> state maint

Learn more about forcing a server's administrative state.

Apache

You can disable a node (or "worker") by setting its activation member attribute to disabled. Learn more about advanced load balancer worker properties in Apache.

Azure Application GatewayWe provide a deployment template for Confluence Data Center on Azure; this template uses the Azure Application Gateway as its load balancer. The Azure Application Gateway defines each node as a target within a backend pool. Use the Edit backend pool interface to remove your node's corresponding entry. Learn more about adding (and removing) targets from a backend pool.

Traffic is disproportionately distributed during or after upgrade

Some load balancers might use strategies that send a disproportionate amount of active users to a newly-upgraded node. When this happens, the node might become overloaded, slowing down Confluence for all users logged in to the node.

To address this, you can also temporarily disconnect the node from the cluster. This will force the load balancer to re-distribute active users between all other available nodes. Afterwards, you can add the node again to the cluster.

Node won't start up 

If a node is Offline or Starting for too long, you may have to troubleshoot Confluence on the node directly. See Confluence Startup Problems Troubleshooting for related information.

Last modified on Nov 10, 2020

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.