Jira Software Support

Environment

Jira Data Center 9.x

Summary

In Jira DC nodes share their indexes via shared home. What triggers the creation of index snapshot and how it is being used changed across Jira versions?

Solution

Legacy mode

Until Jira 8.19 every starting node would request creating an index snapshot from any existing node in the cluster. This mode requires that a new node can join the cluster only if all existing nodes have a proper index at the time a new node joins the cluster. There are many things which can go wrong in this scenario, like:

the state of the cluster is not up to date and there is no other node which can provide the index
the node which handles the request of delivering the index has a faulty index
the node which handles the request of delivering the index fails to create the index snapshot
the node which handles the request of delivering the index fails to inform the starting node that the index snapshot was created

and other potential problems which can result in: JRASERVER-72125

Index snapshot - ready on start

Jira 8.20

In Jira 8.19 we have introduced a new way of getting an index snapshot for new nodes. When a new node starts it looks for an index snapshot in shared home. If this snapshot is fresh enough it will restore its index based on this snapshot. Since Jira 8.19 a random node will produce an index snapshot every 24 hours (by default).

If the starting node fails to get the snapshot from the shared home (no snapshot or the snapshot is not fresh enough) it falls back to legacy mode.

With this change the chance of running into JRASERVER-72125 was greatly reduced. However, it is still possible that the index snapshot created by the scheduler is inconsistent (for example: the scheduler runs on a node where the index is currently not consistent).

Index snapshot - quality guaranteed

Jira 9.0

In Jira 9.0 we have made a couple of changes to guarantee the quality of the index on shared home.

Index snapshot - location

All index snapshots now use the same file naming scheme regardless of their location:

IndexSnapshot_<unique_number>_<yyMMdd-HHmmss>.<tar.sz|tar|zip>

The index file and snapshot locations have also changed:

<local_home_directory>/caches/indexesV2 stores index files
<shared_home_directory>/caches/indexesV2/snapshots stores index snapshots that were:
- created by scheduled index backups
- retrieved by nodes joining the cluster
- used for snapshot recovery
- replicated to the secondary home directory
<shared_home_directory>/caches/indexesV2/snapshots stores index snapshots created:
- on the completion of a full reindex and retrieved by other nodes on reindex detection
- when a new node joined the cluster
- on administrator request
- on data import

Index snapshot - quality

Before creating (and sending) an index snapshot to the shared home the node will always check if the index is consistent. If the index is not consistent the operation will not be performed and this will be only visible in the logs of the node which was requested to create the index snapshot:

Example log message: any time a node is requested to create an index snapshot and fails the index consistency check

ERROR Index backup failed. Index backup can be done only on consistent index.

Example log message: node1 requested an index snapshot from node2

ERROR Note that node: [node1] is waiting for an index and failed to restore the index from shared and from this node
      This state require admin action, Both nodes: [node1] and [node2], must obtain a consistent index.
      Please check KB: https://confluence.atlassian.com/x/OYNyQg to find out how can you solve this problem.

How to make sure there is a consistent index snapshot on shared home

Full reindex

Running the full-reindex on any node will trigger creating an index snapshot and send it to the shared home.

Index copy

If there is a node in the cluster that contains a consistent index, copying this index to any other node via the admin panel (Admin/System/Indexing/Copy the Search Index from another node) will result in creating an index snapshot on shared home.

With 9.0 changes the chance of running into JRASERVER-72125 should be even lower.

Please make sure that in the process of starting new nodes you include a check that an index snapshot is available in the shared home:

make sure that index snapshots are created by the scheduler
any operation triggering large indexing (for example: project import) should be followed by creating an index snapshot

Index Analyzer

When a small number of issues is affected, Jira's index analyzer can list the issues and fix them in a specific node. Check How to use Jira's index analyzer to fix index inconsistencies

Troubleshooting

Please use the following grep across all nodes' logs to see log messages related to indexing and index management:

grep 'IndexUtils\|ArchiveUtils\|DefaultIssueIndexer\|DefaultClusterManager\|DefaultIndexCopyService\|DefaultNodeReindexService\|SnapshotDeletionPolicyContributionStrategy\|DefaultIndexManager' atlassian-jira.log

Q&A

How Jira updates the index with changes done after the index snapshot was created?

Every time the index snapshot is restored (a few hours old from snapshot of "just" created from another node) we will run an "index-fixer" after restoring this snapshot. This is not blocking users from accessing this node (/status may return the status that the node is running) so this can happen in the background.

In Jira 8.20 we are still running 2 index fixers:

legacy-fixer: which uses the max issue update time from DB vs max issue update time from the restored index: based on this it will reindex all issues in this time range
new version based fixer: this one will try to use the version table to determine which issues (and related entities) need to be re-indexed (or deleted from the index)

In Jira 9.3 we removed the legacy fixer as it is not needed anymore since all entities have versions.

To see all logs related to fixing the index after restoring it from the snapshot please grep the log with: [INDEX-FIXER]

How do we calculate the time range on which should run?

If the index has meta information with a timestamp we will use this as the time range start (only snapshots created with full-foreground reindex have this timestamp) and get the max issue update time from DB (time range end).

If the index has no meta information with timestamp we will use max issue update time from index (time range start) and get max issue update time from DB (time range end).

Jira Software Support

Get started

Knowledge base

Products

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

Resources

Documentation

Community

System Status

Suggestions and bugs

Marketplace

Billing and licensing

Indexing inconsistency troubleshooting

Still need help?

Environment

Summary

Solution

Legacy mode

Index snapshot - ready on start

Index snapshot - quality guaranteed

Index snapshot - location

Index snapshot - quality

Example log message: any time a node is requested to create an index snapshot and fails the index consistency check

Example log message: node1 requested an index snapshot from node2

How to make sure there is a consistent index snapshot on shared home

Full reindex

Index copy

Index Analyzer

Troubleshooting

Q&A

How Jira updates the index with changes done after the index snapshot was created?

How do we calculate the time range on which should run?

Page

Viewport

Confluence

Indexing inconsistency troubleshooting

Related content

Still need help?

Environment

Summary

Solution

Legacy mode

Index snapshot - ready on start

Index snapshot - quality guaranteed

Index snapshot - location

Index snapshot - quality

Example log message: any time a node is requested to create an index snapshot and fails the index consistency check

Example log message: node1 requested an index snapshot from node2

How to make sure there is a consistent index snapshot on shared home

Full reindex

Index copy

Index Analyzer

Troubleshooting

Q&A

How Jira updates the index with changes done after the index snapshot was created?

How do we calculate the time range on which should run?

Related content