How we make sure Confluence Data Center stays enterprise-ready
If you're a large enterprise interested in deploying Confluence Data Center, you may be curious how we ensure it can handle heavy enterprise workloads. In Confluence Data Center sample deployment and monitoring strategy, we discussed how we deploy, maintain, and monitor our own Confluence Data Center instance (which, by the way, always runs the latest version). This lets us see firsthand how it performs in a production environment, under one of the heaviest real-life workloads we've ever seen applied to Confluence.
Running and using Confluence Data Center ourselves is only half of the picture, though. In this article, we discuss our performance testing methodology. We'll explore how we built a series of performance tests into each phase of the release process, covering everything from preventing performance regressions to maintaining satisfactory performance under different loads. These tests help us ensure that each version can withstand different enterprise-grade workloads (as described in Confluence Data Center load profiles).
You'll also learn more about our performance testing environment and test harness. We'll cover our methods for testing heavy, enterprise-grade workloads on Confluence Data Center.
Performance testing throughout development
Our developers review the possible performance impact of each proposed feature and change to Confluence. They identify any risks to our performance targets and write appropriate tests to check the extent of the risk. Quality Engineers provide technical support or validation whenever developers need it.
Here's a high-level view of this process:
Developers embed their performance tests throughout every stage of a feature's lifecycle, namely:
Development
Developers run performance tests directly on their local machines, where they can test the performance impact of code changes in isolation. This provides a quick feedback loop, and enables them to test the impact of incremental changes in a very focused way. For example, when changing a specific macro's front-end, a developer can disable other performance tests and focus only on a specific page containing the macro.
Feature branch
Once developers are happy with the feature's performance in isolation, they start testing whether it impacts other Confluence functionality. To do this, they push the changes to Feature branch builds running on our continuous integration (CI) pipeline.
Not all features undergo this performance test. If a feature poses a low risk of introducing a performance regression, a developer can merge the feature directly to the master
branch.
Master branch
All features are eventually merged into the master
branch. From there, our CI pipeline compiles Confluence with all other new, merged features. Each new build is installed on a test instance, then subjected to one of the heaviest Confluence workloads we've seen in production.
Mandatory performance testing in the master
branch provides an extra layer of protection against performance regressions. It ensures that all new features undergo performance testing, one way or another.
Release Candidate
Every release candidate undergoes a final performance test, and we compare its results with established baselines to ensure it can perform well under heavy workloads.
Enterprise readiness testing methodology
Aside from protecting us against performance regressions, the performance tests we run on the master
branch also help us study how a build withstands different workloads. One of them is an enterprise readiness workload, where we apply a large-sized content and traffic to Confluence (see Confluence Data Center load profiles for different content and traffic profiles).
Test environment and architecture
We run our enterprise readiness workload tests on an Amazon Web Services (AWS) Virtual Private Cloud, which is made up of the following AWS instances:
Function | AWS node type | Number of nodes |
---|---|---|
Confluence application | c5.2xlarge (running Amazon Linux) | 4 |
Synchrony (for collaborative editing) | c5.2xlarge (running Amazon Linux) | 1 |
Load balancer | AWS Application Load Balancer | 1 |
Database | m4.xlarge | 1 |
Apache JMeter | c4.2xlarge | 1 |
Shared home directory | m4.large | 1 |
Refer to the AWS documentation on Instance Types (specifically, General Purpose Instances and Compute-Optimized Instances) for details on each node type.
Our database runs on PostgreSQL RDS, and is pre-loaded with a large-sized data set (or content profile). In the future, we will be adding identical test environments that feature other database providers.
All of the Confluence application nodes mount the shared home folder hosted on an NFS server.
Technical components
Our test harness is composed of the following components:
Component | Role | Description |
---|---|---|
Load injector | We generate our test traffic through JMeter. Specifically, we use JMeter configuration scripts to simulate HTTP calls with provided parameters. These parameters include usernames, spaces, and pages to edit or view. We run JMeter in a distributed manner so we can scale the load easily. JMeter doesn't execute or evaluate Javascript, nor does it actually render the page. To address this, we run Selenium browsers simultaneously with JMeter to measure page load times. | |
Load injector | We use Selenium to execute a portion of our user interactions to simulate a more realistic user experience. In our environment we run five headless Chrome browsers executing the same JMeter, but with the added benefit of parsing and executing Javascript. This allows us to measure different front-end performance metrics (for example, parsing time and DOM load time) along with overall performance. The Selenium browsers send custom analytics to our test systems, where we analyze them further to break down the cost of page load times. | |
Ansible | Infrastructure deployment | We use Ansible playbooks to orchestrate the entire testing process, from provisioning the test environment in AWS, all the way to launching the load injectors and collecting test data. Orchestration allows us to run the same type of performance tests across multiple builds throughout the development process. |
Workload
We modeled our enterprise readiness testing workload after traffic and content on our own internal Confluence Data Center instances. Namely, it's the one we discussed previously in Confluence Data Center sample deployment and monitoring strategy. In terms of data volume and HTTP traffic, this instance's workload is one of the 10 heaviest we've ever seen in production.
Our load injectors generate transactions made up of multiple HTTP requests to simulate different user actions. JMeter and Selenium both work together to aggregate those user actions to simulate Confluence's business-critical workflows.
Our test harness, as a whole, generates 19,171 transactions per hour (5.3 transactions per second). This produces a throughput equivalent to 431,000 HTTP requests per hour (or 120 per second).
Data set
Like our workload, we modeled our test instance's data set after a snapshot of the instance described in Confluence Data Center sample deployment and monitoring strategy. This snapshot has the following dimensions:
Dimension | Value (Approx.) |
---|---|
Total Spaces | 6,550 |
Site Spaces | 1,500 |
Personal Spaces | 5,000 |
Content (All Versions) | 16,000,000 |
Content (Current Versions) | 6,900,000 |
Comments | 2,000,000 |
Local Users | 12,300 |
Local Groups | 9,900 |
Execution
Once our test harness provisions the test environment via Ansible, the enterprise readiness test proceeds in the following manner:
Warm-up
We perform the test on a cold application with newly-provisioned architecture. As such, Confluence will need to warm up before providing useful results. We use the first 15 minutes of the test as our warm-up period, and we discard all test data from this time. We also use this time to start logging in 360 active users, which we will use to simulate each user transaction. All active users remain logged in to Confluence for the test, and all user actions will be triggered through them.
Peak load
After warm-up, we apply the test workload for two hours. This involves assigning each active user to a Workflow group, where they will execute a set of user actions. Each workflow group is based on Confluence's business-critical workflows.
We use JMeter to trigger each user action:
Workflow group | User actions |
---|---|
Create Page | Login → View Dashboard → Search Page → View Page → Like Page → Create Page → Restrict Page → Add label → Logout |
Edit Page | Login → View Dashboard → Search Page → View Page → Edit Page → View Page → Upload Attachment → Logout |
View Blog | Login → View Dashboard → Search Blog → View Blog → Like Blog → View Inline Comments → Upload Attachments → Create Comments → Logout |
View/Update Inline Comments | Login → View Dashboard → View Page → View Inline Comments → Logout |
Create Blog | Login → View Dashboard → Create Blog → Upload Attachments → Create Blog → Create Inline Comments → Logout |
Create Comments | Login → View Dashboard → Search Page → View Page → Create Page Comments → Create Inline Comments → Logout |
At the same time, Selenium will run a generic workflow with the following actions:
Login → View Dashboard → Create Page → Create Inline Comments → Viewed Recently viewed page → Edit Page → View Blog → View Page → View Popular Page → Logout |
Local, active, and concurrent users
For the purposes of performance testing, we differentiate between three types of users:
- Local: covers all users with login information on the Confluence instance. Our test instance has 12,300 of them.
- Active: these are all the users who are currently logged in. We also perform all of our test actions through these users. Throughout the test, we execute 19,171 user actions per hour across all 360 active users.
- Concurrent: these are all active users who simultaneously trigger a user action. We set different think times between user actions, resulting in an average of 8 and a maximum of 29 concurrent users throughout the test.
Collecting and analyzing results
We use an internal InfluxDB server to aggregate and collect test data in real time. This server allows us to actively monitor results even while a test is underway, and also to compare results with historical data. Our test harness uses the following tools to collect these metrics and send them to InfluxDB:
Tool | Description |
---|---|
Amazon CloudWatch | Deploying our test environment in AWS allows us to use Cloudwatch to collect detailed resource usage metrics from each node. |
Telegraf | We install Telegraf agents on our application nodes to monitor and collect data on Confluence. These include metrics for JVM, Garbage Collector stats, hibernate stats, query counts, caching, database pooling, and more. |
JMeter plug-in | This component parses and creates graphs out of the data sent to InfluxDB. This allows us to visualize different types of traffic data: throughput, active threads, success/error rates, and transaction response times. |
Custom tools | We developed a series of scripts to send browser navigation timings and performance results directly to InfluxDB. |
We built several Grafana-based dashboards to visualize different areas of the InfluxDB data. We also send all logs from the AWS nodes and load injectors to a central Splunk server, where they can thoroughly investigate concerning events.
We're here to help
If you have any questions about our methodology, reach out to our Advisory Services or Premier Support team.