Troubleshooting performance with Jira Stats

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Intro

Jira has a number of stats logs (metrics) that can be used to troubleshoot different performance problems. The goal of the page is to give an overview of metrics that can be used as a way to measure the Jira performance or give a sense of the environment performance (disk, network, or database).

Please check the related article for more details regarding the types of logs: Jira stats logs

Note about the Healthy State column: the values in it are based on data from our internal Jira instances and provided here as a general reference.

Applicable to versions:  8.13+ 

DC Specific

LOCALQ, DBR, and INDEX-REPLAY loggings are present only in the Data Center version.

Disk performance

Why is this important metric to check:

  • slow IO will affect user requests related to Lucene search (JQL) and updates (issues Create/Update)

  • slow IO will affect node2node cache communications (slow updates of the LocalQ files)

  • slow IO will affect issues reindexing time: Full Reindex, Background, and Project reindex

Metrics to check:

StatsMetricDescriptionHealthy state
[LOCALQ] [VIA-INVALIDATION]timeToAddMillisThe time to store a cache replication message in a local store before sending it to the other node; write performance of the LocalQ, mostly disk~1 ms

DB performance

Why is this important metric to check:

  • slow IO will affect user requests related to issues Create/Update, viewing Single issue, generating reports

  • slow IO will affect the cache population

  • slow IO will affect issues reindexing time: Full Reindex, Background, and Project reindex

Metrics to check:

StatsMetricDescriptionHealthy state
[VERSIONING]getIssueVersionMillisTime to read the single row in the Version table, we expect this to be fast operations which include: processing + network + DB read time.~5 ms
[VERSIONING]incrementIssueVersionMillisTime to update (increment) the single row in the version table, we expect this to be fast operations which include: processing + network + DB update time.~10 ms

Internode network latency

Why is this important metric to check:

  • slow IO will affect the cache replication (cluster consistency)

  • slow IO will affect cluster index replication (DBR)

Metrics to check:

StatsMetricDescriptionHealthy state
[LOCALQ] [VIA-INVALIDATION]timeToSendMillisThe time to deliver the cache replication message from the current node to the destination node.~10 ms
[LOCALQ] [VIA-INVALIDATION]queueSizeSize of the cache replication Queue0
[DBR] [RECEIVER] receiveDBRMessage
DelayedInMillis
Time difference between generating the message and receiving it. Note that time is local to both nodes so this includes time drift between nodes. Includes: serialization/de-serialization + time spent in the LocalQ + time to send the message (RTT)~100 millis

Index reads/writes (Lucene)

Why is this important metric to check:

  • slow operations will affect user requests related to issues Create/Update/Transition

  • slow operations will affect issues reindexing time: Full Reindex, Background, and Project reindex

In the following table, you'll find the metrics you should check.

In the Healthy value column, the values for [index-writer-stats], [DBR] [RECEIVER], and [LOCALQ] [VIA-COPY] indicate the approximate average around which you can expect a healthy value. Unless these values are greatly exceeded, the system keeps working optimally.

StatsMetricDescriptionHealthy value
See Disk performance


See DB performance


[index-writer-stats]updateDocumentsWith
VersionMillis
Time spent for all steps required to conditionally add/update the index in the Lucene (Search + Updating Lucene)avg ~10 ms
[DBR] [RECEIVER] processDBRMessageUpdateWith
RelatedIndexInMillis
Time for the complex object to wait in the Lucene queue to write to the index includes: waiting in the queue + conditional updating Luceneavg ~30 ms
[LOCALQ] [VIA-COPY]timeToSendMillis

Time to deliver and apply the cache replication message from the current node to the destination node.

Includes serialization/de-serialization, time to send the message (RTT), acknowledgement from the receiver, and update Lucene.

avg ~50 ms

[lucene-stats]

flushIntervalMillis

The average time to the flush was triggered since the last snapshot.

The metric is mostly relevant to the Foreground reindex and may indicate the need to increase the Lucene buffer.

avg > 1 sec
(Note: during foreground indexing)

Indexing performance

Why is this important metric to check:

  • slow operations will affect user requests related to issues Create/Update/Transition

  • slow operations will affect issues reindexing time: Full Reindex, Background, and Project reindex

  • slow operations will affect cluster consistency

Metrics to check:

StatsMetricDescriptionHealthy state
See Disk performance


See DB performance


See Index reads/writes (Lucene)


[indexing-stats]addIndex.avgAverage time to load the data from the Custom Field providerDepends on the custom fields functionality. Note that this has a direct implication on indexing time and affects every request creating/updating issues, comments, and full reindex time.
[INDEX-REPLAY] [STATS] timeInMillisTime of processing a batch of index operations; the replay process is run every 5 secs~ 5 secs
[INDEX-REPLAY] [STATS] updateIndexInMillisTime to perform all required (after checking DBR) index operations (document creation + Lucene) in a specific batch~ 5 secs
[INDEX-REPLAY] [STATS] DBR effectivenessDBR effectiveness = (1 -  filterOutAlreadyIndexedAfterCounter.ISSUE.sum / filterOutAlreadyIndexedBeforeCounter.ISSUE.sum)> 90%
[INDEX-REPLAY] [STATS] numberOfRemoteOperationsThe number of processed index operations from other nodes; helps check traffic distribution and cluster load
[INDEX-REPLAY] [STATS] numberOfLocalOperationsThe number of processed index operations from the current node; helps check traffic distribution and cluster load




Last modified on Nov 19, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.