Scaling Jira Data Center
Jira Data Center provides high availability for your application, and it also allows you to horizontally scale your cluster by adding additional nodes, to ensuring application performance as concurrent usage grows. While the performance depends on your environment, here are Jira Data Center performance increases we measured in a test deployment at Atlassian. It's important to be aware of some of the implementation details when adding nodes to your cluster so you can properly scale Jira Data Center to fit your needs. This article gives a brief overview of some of the things to consider when scaling your application.
Additional components
Your Data Center deployment must include:
Load balancer
The load balancer is a fundamental component of Data Center that facilitates high availability and scalability of your application. The most important configuration setting for your load balancer is to ensure that it supports sticky sessions. You can find more information about configuring your load balancer here:
Shared database
When setting up your shared database, ensure that the database allows at least the as many connections as all of your Jira Data Center nodes. For example, if you have three Jira Data Center nodes, and each node supports a maximum of 80 connections, then your shared database must allow at least 240 connections. It's likely that you will need to configure additional connections for administrative or other purposes. You should also ensure that you're deploying a database that Jira Data Center supports.
Shared storage
All Data Center deployments require a remote shared file system. This could be network attached storage via a file system protocol such as NFS. All nodes in the cluster must have access to a shared home directory in the same path. A shared file system stores plugins, a copy of the Lucene index, attachments, avatars, and other data used by the Jira application.
Node communication
Each Jira Data Center node communicates with the other nodes to synchronize their local Lucene indexes and local memory caches. For example, when a user updates a Jira ticket on one Jira node, that node updates the Lucene index on both its local cache and in the shared database, and then tells the other nodes that it updated the index. The other nodes then check the database and copy the changed data to their local Lucene indexes.
The Lucene index is eventually consistent, meaning there will be some delay in updating the index across all nodes.
Additionally, each node in Jira Data Center replicates its memory cache across all other nodes using a specially configured Ehcache and RMI. User and group permissions are cached in each node's memory, among other data items. So if a group permission changes, then the cluster replicates that across all the nodes in the cluster.
Network latency
Given how much the nodes communicate, we recommend that you run your cluster on a high-speed network, and optimize as much as possible for low-latency. High latency can prevent your nodes from staying synchronized with each other, and they may also appear to go offline and come back online erratically. Co-locating the cluster nodes in the same physical place and on the same subnet is one way help reduce latency.
Currently, Data Center does not support geo-clustering.
Adding nodes
Since each node communicates with the others, adding nodes to your cluster also adds network traffic. It's important that you use network monitoring tools to make sure your cluster can handle the increased traffic both between nodes, and also to your database and shared file system. You can run network monitoring and performance tests on your application using these performance testing tools to help understand the total number of nodes your environment can gracefully handle. Atlassian's internal testers have tested up to 4 application nodes for Jira Data Center. If you find that you are hitting a maximum number of nodes in your cluster, you might consider upgrading (also known as "vertically scaling") your existing nodes to get more capacity and performance.
Security
To secure your application, use a firewall or network segregation (or both) to ensure that only permitted nodes connect to Jira Data Center's Ehcache RMI ports. If you use a firewall, you must open ports in the firewall between nodes and the cache, or else you may see cache replication issues. For Jira versions 7.3 and later, the two default Ehcache RMI ports are 40001 and 40011. For Jira versions 7.2 and earlier, the first default port is 40001, and the second port is a TCP ephemeral port which requires a range of open ports. This range is operating system-specific and usually tunable with a registry setting (Windows) or kernel parameter (UNIX/Linux).
Node discovery
Your cluster nodes need to know about each other to communicate with each other. When a node starts up, it attempts to discover any other running nodes. If it finds any, the new node tries join them in the existing cluster. If it cannot find any other nodes, it starts its own cluster. Jira Data Center supports 2 types of node discovery:
- Database discovery (default)
- Multicast
For database discovery, the Jira Data Center database contains a clusternode
table that lists all active nodes and their associated IP address. Each node must check in on a periodic basis, and update an associated clusternodeheartbeat
table. If an update on this table is older than 5 minutes, the node is considered down. We strongly recommend using database discovery over multicast, as database discovery is much more efficient, and also because we have observed stability issues when using multicast. Additionally, if you are deploying Jira Data Center in AWS, AWS does not support multicast.
For information on configuring node discovery, read how to configure the cluster.properties file parameters when installing Jira Data Center.