Automation for Jira: Monitoring Automation Queue

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Automation rules requiring asynchronous execution are run in the background by adding and processing events from the automation queue. Any system slowness, time-consuming rules, or burst of incoming events can cause events to pile up in the automation queue.

Starting Jira 10.2, we’ve introduced the following alerts, metrics, and APIs to improve monitoring and investigating capabilities for the automation queue. These capabilities of identifying and investigating issues in queue processing enable timely action, avoiding any further impact due to delay in automation rules execution.

Skip to:

Jira 10.2 enhancements

The automation queue count is now available to monitor the processing of the automation queue. This count is used to raise alerts, record JMX metrics, and gather instrumentation statistics. Additionally, Jira statistics have been introduced to aid in monitoring and troubleshooting the automation queue processing and rule execution.

Alert on automation queue count exceeding the threshold

We’ve introduced the following alert for monitoring the automation queue in the app diagnostics framework, accessible at <JIRA_URL>/plugins/servlet/diagnostics/overview.

An alert is raised in diagnostics when the automation queue count exceeds a configurable threshold, indicating a buildup of unprocessed events in the automation queue. The default threshold value is 10,000 and can be changed using a Jira config property jira.diagnostics.thresholds.max-automation-queue-size.

JMX metric for automation queue count

A JMX metric for the automation queue count (automation.queue.count.max_stale_10mins) has been added. As the name suggests, this metric may show values captured up to 10 minutes earlier. A value of -1 indicates that the metric could not be captured.

This metric enables monitoring of the automation queue with external APM tools like Prometheus and Grafana. Refer to this documentation for more information on integrating with APM tools.

Instrumentation statistics for automation queue count

The JMX metric for the automation queue count mentioned above is also available as instrumentation statistics under the same name.

Jira stats for automation queue processing and rule execution

Jira records various stats in the primary Jira log file (atlassian-jira.log*) or the stats log file (atlassian-jira-stats.log*) from Jira 10.4 onwards on every node in a cluster. These statistics assist in measuring and troubleshooting Jira's performance. Jira logs "total stats" and "snapshots" of the statistics every 5 minutes by default.

The following Jira statistics have been added to capture the addition and processing of events from the automation queue and the execution of automation rules. It is important to note that these statistics are captured independently for each node in the cluster and reflect only the state of that particular node.

These statistics enable the creation of dashboards in logging tools, such as Splunk, to monitor the automation queue's processing rate and troubleshoot issues.

Stats prefix

Description

[AUTOMATION-EVENT] [eventsAddedStats]

Stats measuring the number of events added to the automation queue

[AUTOMATION-EVENT] [eventsClaimedStats]

Stats measuring the number of events picked up for processing

[AUTOMATION-EVENT] [eventsProcessedStats]

Stats measuring the number of events processed

[AUTOMATION-RULE] [rulesExecutedStats]

Stats measuring the number of rules executed

Jira 10.4 enhancements

In addition to the automation queue count, the age of the earliest unprocessed event is now available and is used for raising alerts, recording JMX metrics, and gathering instrumentation statistics. New APIs have also been introduced to provide insight into the automation queue and audit logs, helping troubleshoot issues related to the automation queue's growth.

Alert on automation queue experiencing processing delays

An alert is raised in diagnostics when an unprocessed event in the automation queue is older than a configurable threshold, indicating longer processing times for events. The default threshold value is 3600 seconds, and it can be changed using a Jira config property jira.diagnostics.thresholds.earliest-unprocessed-item-seconds.

JMX metric for age of earliest unprocessed event in automation queue

A JMX metric for age of the earliest unprocessed event in the automation queue (automation.queue.earliest.unprocessed.message.age.secs.max_stale_10mins) has been added. Similar to the metric for automation queue count, this metric may show values captured up to 10 minutes earlier. A value of -1 indicates that the metric could not be captured.

This metric also enables monitoring of the automation queue with external APM tools like Prometheus and Grafana. Refer to this documentation for more information on integrating with APM tools.

Instrumentation statistics for age of earliest unprocessed event in automation queue

The JMX metric for the earliest unprocessed event age mentioned above is also available as instrumentation statistics under the same name.

APIs to get insight of automation queue

The following APIs have been introduced to help analyze the contents of the automation queue by event type or rule ID. These APIs provide information on the current queue size and group the events in the automation queue by event type or rule ID.

These APIs are available for Jira admins and are particularly useful for troubleshooting issues related to automation queue growth. They help identify specific event types or rules that may be contributing to the increase in queue size.

Resources
  • Automation queue insight by event type

    • Endpoint

      GET /rest/cb-automation/latest/insight/automation-queue/by-event-type
    • Optional query param

      • limit the number of groups to return; optional with a default of 20
    • cURL with params

      cURL
      curl --request GET \
        --url 'https://{baseurl}/rest/cb-automation/latest/insight/automation-queue/by-event-type?limit=20' \
        --user 'admin@example.com:<api_token>' \
        --header 'Content-Type: application/json'
    • Response: 200 OK - application/json. Returns insight of the automation queue by event type.

      Example
      {
        "queueSize": 285,
        "queueInsightByEventType": [
          {
            "eventType": "jira:issue_created:issue_created",
            "count": 200
          },
          {
            "eventType": "jira.sla.threshold.trigger",
            "count": 80
          },
          {
            "eventType": null,
            "count": 5
          }
        ],
        "timestamp": "2025-01-22T15:19:56.111Z",
        "limit": 20
      }
  • Automation queue insight by rule

    • Endpoint

      GET /rest/cb-automation/latest/insight/automation-queue/by-rule
    • Optional query param

      • limit the number of groups to return; optional with a default of 20

    • cURL with params

      cURL
      curl --request GET \
        --url 'https://{baseurl}/rest/cb-automation/latest/insight/automation-queue/by-rule?limit=20' \
        --user 'admin@example.com:<api_token>' \
        --header 'Content-Type: application/json'
    • Response: 200 OKapplication/json. Returns insight of the automation queue by rule.

      Example
      {
        "queueSize": 285,
        "queueInsightByRule": [
          {
            "ruleId": null,
            "ruleName": null,
            "count": 205
          },
          {
            "ruleId": 16,
            "ruleName": "Email on SLA breach",
            "count": 50
          },
          {
            "ruleId": 18,
            "ruleName": "Auto assign issues",
            "count": 30
          }
        ],
        "timestamp": "2025-01-22T15:19:56.111Z",
        "limit": 20
      }

It's important to note that there may be a group with a null rule ID / rule name or a null event type. Events with a null rule ID / rule name are common and indicate that these events are added to the automation queue without being associated with any specific rule. The evaluation of which rules need execution occurs during event processing. Events with a null event type are added in situations such as the processing of outgoing webhooks, where the event type is not included in the automation queue record.

API to get insight of audit logs

The following API has been introduced to help analyze the audit logs of automation rules execution by event source. This API provide audit logs grouped by event source for a specified timeframe.

This API is also available for Jira admins and is particularly useful for identifying event sources that have resulted in a high number of rule executions and may have contributed to the growth of the automation queue.

Resource
  • Audit log insight by event source

    • Endpoint

      GET /rest/cb-automation/latest/insight/audit/by-event-source
    • Optional query params

      • startTime and endTime the timeframe for getting the audit logs to be analyzed; optional with a default of last 24 hours; example: 2025-01-22T15:00:00Z

      • limit the number of groups to return; optional with a default of 20

    • cURL with params

      cURL
      curl --request GET \
        --url 'https://{baseurl}/rest/cb-automation/latest/insight/audit/by-event-source?startTime=2025-01-21T15:00:00Z&endTime=2025-01-22T15:00:00Z&limit=20' \
        --user 'admin@example.com:<api_token>' \
        --header 'Content-Type: application/json'
    • Response: 200 OK - application/json. Returns insight of the audit logs by event source.

      Example
      {
        "startTime": 1737558000000,
        "endTime": 1737471600000,
        "limit": 20,
        "results": [
          {
            "eventSource": "jira.issue.event.trigger:created",
            "count": 400
          },
          {
            "eventSource": "jira.jql.scheduled",
            "count": 250
          },
          {
            "eventSource": "jira.sla.threshold.trigger",
            "count": 80
          }
        ]
      }





Last modified on Jan 21, 2025

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.