Automation for Jira: Monitoring Automation Queue
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Automation rules requiring asynchronous execution are run in the background by adding and processing events from the automation queue. Any system slowness, time-consuming rules, or burst of incoming events can cause events to pile up in the automation queue.
Starting Jira 10.2, we’ve introduced the following alerts, metrics, and APIs to improve monitoring and investigating capabilities for the automation queue. These capabilities of identifying and investigating issues in queue processing enable timely action, avoiding any further impact due to delay in automation rules execution.
Skip to:
Jira 10.2 enhancements
The automation queue count is now available to monitor the processing of the automation queue. This count is used to raise alerts, record JMX metrics, and gather instrumentation statistics. Additionally, Jira statistics have been introduced to aid in monitoring and troubleshooting the automation queue processing and rule execution.
Alert on automation queue count exceeding the threshold
We’ve introduced the following alert for monitoring the automation queue in the app diagnostics framework, accessible at <JIRA_URL>/plugins/servlet/diagnostics/overview.
An alert is raised in diagnostics when the automation queue count exceeds a configurable threshold, indicating a buildup of unprocessed events in the automation queue. The default threshold value is 10,000
and can be changed using a Jira config property jira.diagnostics.thresholds.max-automation-queue-size
.
JMX metric for automation queue count
A JMX metric for the automation queue count (automation.queue.count.max_stale_10mins
) has been added. As the name suggests, this metric may show values captured up to 10 minutes earlier. A value of -1
indicates that the metric could not be captured.
This metric enables monitoring of the automation queue with external APM tools like Prometheus and Grafana. Refer to this documentation for more information on integrating with APM tools.
Instrumentation statistics for automation queue count
The JMX metric for the automation queue count mentioned above is also available as instrumentation statistics under the same name.
Jira stats for automation queue processing and rule execution
Jira records various stats in the primary Jira log file (atlassian-jira.log*
) or the stats log file (atlassian-jira-stats.log*
) from Jira 10.4 onwards on every node in a cluster. These statistics assist in measuring and troubleshooting Jira's performance. Jira logs "total stats" and "snapshots" of the statistics every 5 minutes by default.
The following Jira statistics have been added to capture the addition and processing of events from the automation queue and the execution of automation rules. It is important to note that these statistics are captured independently for each node in the cluster and reflect only the state of that particular node.
These statistics enable the creation of dashboards in logging tools, such as Splunk, to monitor the automation queue's processing rate and troubleshoot issues.
Stats prefix | Description |
---|---|
| Stats measuring the number of events added to the automation queue |
| Stats measuring the number of events picked up for processing |
| Stats measuring the number of events processed |
| Stats measuring the number of rules executed |
Jira 10.4 enhancements
In addition to the automation queue count, the age of the earliest unprocessed event is now available and is used for raising alerts, recording JMX metrics, and gathering instrumentation statistics. New APIs have also been introduced to provide insight into the automation queue and audit logs, helping troubleshoot issues related to the automation queue's growth.
Alert on automation queue experiencing processing delays
An alert is raised in diagnostics when an unprocessed event in the automation queue is older than a configurable threshold, indicating longer processing times for events. The default threshold value is 3600
seconds, and it can be changed using a Jira config property jira.diagnostics.thresholds.earliest-unprocessed-item-seconds
.
JMX metric for age of earliest unprocessed event in automation queue
A JMX metric for age of the earliest unprocessed event in the automation queue (automation.queue.earliest.unprocessed.message.age.secs.max_stale_10mins
) has been added. Similar to the metric for automation queue count, this metric may show values captured up to 10 minutes earlier. A value of -1
indicates that the metric could not be captured.
This metric also enables monitoring of the automation queue with external APM tools like Prometheus and Grafana. Refer to this documentation for more information on integrating with APM tools.
Instrumentation statistics for age of earliest unprocessed event in automation queue
The JMX metric for the earliest unprocessed event age mentioned above is also available as instrumentation statistics under the same name.
APIs to get insight of automation queue
The following APIs have been introduced to help analyze the contents of the automation queue by event type or rule ID. These APIs provide information on the current queue size and group the events in the automation queue by event type or rule ID.
These APIs are available for Jira admins and are particularly useful for troubleshooting issues related to automation queue growth. They help identify specific event types or rules that may be contributing to the increase in queue size.
It's important to note that there may be a group with a null rule ID / rule name or a null event type. Events with a null rule ID / rule name are common and indicate that these events are added to the automation queue without being associated with any specific rule. The evaluation of which rules need execution occurs during event processing. Events with a null event type are added in situations such as the processing of outgoing webhooks, where the event type is not included in the automation queue record.
API to get insight of audit logs
The following API has been introduced to help analyze the audit logs of automation rules execution by event source. This API provide audit logs grouped by event source for a specified timeframe.
This API is also available for Jira admins and is particularly useful for identifying event sources that have resulted in a high number of rule executions and may have contributed to the growth of the automation queue.