OpenSearch upgrade guide
What's changed
In Confluence 9.0, OpenSearch became an opt-in feature. If you’re an app developer using the Lucene API independently, consider moving to the Confluence search v2 API to minimize disruption to customers who take advantage of the OpenSearch engine capabilities. We will maintain compatibility with our v2 search API as much as possible, so most current integrations should not cause any issues.
Testing your plugins with OpenSearch
Test your plugins to make sure they work as expected with OpenSearch. Refer to the Configuring OpenSearch with Confluence guide to get started.
Deprecations and changes
Index fields must be consistent across documents
We use FieldDescriptor to add a value, such as filename
, as a field to an indexed document — typically from an Extractor2. Each FieldDescriptor
is associated with a mapping which describes how the field is indexed and queried. For example, the field might be indexed and queried by its type (string
vs text
vs long
) and its analyzer.
Previously, you could add a particular field name with differing mappings between documents in the same index. For instance, you could add filename
as text
in one document, but as string
in another.
What’s changed: After rollout, mixing field mappings will create an error message in your log file. The error message will look like the following:
Mapping for 'filename' (TextFieldMapping {name='filename'}) conflicts with existing mapping (StringFieldMapping {name='filename'})
This error message does not currently result in any adverse effects on Lucene. However, this would cause a problem in OpenSearch: this conflicting mapping means that the field may be indexed incorrectly, or the whole document may fail to be indexed altogether.
When: Confluence 8.8
What’s new: To fix this, you should either make the field use the same mapping across all documents of an index, or use different names; for example, filename.text
and filename.string
.
To better enforce this in your code, we recommend you use FieldMapping to declare your fields explicitly. For example, instead of this deprecated code:
// Constant public static final String FILENAME = "filename"; // Extractor A fields.add(new TextFieldDescription(FILENAME, docA.getName(), Stored.YES, new FilenameAnalyzerDescriptor())); // Extractor B fields.add(new TextFieldDescription(FILENAME, docB.getName(), Stored.YES, new FilenameAnalyzerDescriptor()));
write this code:
// Constant public class MyFields implements FieldMappingsProvider { public static final TextFieldMapping FILENAME = TextFieldMapping.builder("filename") .store(true) .analyzer(new FilenameAnalyzerDescriptor()) .build(); @Override public Collection<FieldMapping> getFieldMappings() { return List.of(FILENAME); } } // Extractor A fields.add(MyFields.FILENAME.createField(docA.getName())); // Extractor B fields.add(MyFields.FILENAME.createField(docB.getName()));
It’s best practice to explicitly register your mappings with FieldMappingsProvider in atlassian-plugin.xml
so that they get created on the OpenSearch index when your plugin starts up. Alternatively, Confluence will create them dynamically when you index a document with those mappings.
<field-mappings-provider key="my-custom-fields" index="CONTENT" class="com.example.MyFields" />
AnalyzerDescriptor has been deprecated
Previously, AnalyzerDescriptor
could be used to build a bespoke analyzer by specifying an arbitrary combination of TokenizerDescriptor
, CharFilterDescriptor
s, and TokenFilterDescriptor
s. This analyzer could then be used for indexing (for example, on TextFieldDescriptor) and for querying (for example, on PhraseQuery).
What’s changed: Bespoke analyzers defined with AnalyzerDescriptor
has been deprecated, and will not work with OpenSearch.
When: Confluence 8.7
What’s new: On OpenSearch, you can only use predefined analyzers provided by Confluence (i.e. not AnalyzerDescriptor
).
Explore the current list of supported predfined analyzers in Confluence in the All Current Implementing Classes section of MappingAnalyzerDescriptor.
The contentBody field is no longer stored
In the Confluence content index, the contentBody
field holds the indexed content of a document so it can be queried.
Previously, this field was stored, meaning it could be used to fetch the original value. For example, you could include it in the requestedField
parameter of the search or scan method.
Whats changed: contentBody
is no longer a stored field.
When: Confluence 8.7
Whats New: Instead, the original value of a document’s content will be stored in a new, separate field, called contentBody-stored
. If you currently fetch document content using the contentBody
field, use contentBody-stored
instead.
Note that the contentBody
field is still indexed and used for querying (for example, via CQL contentBody:foo
).
SearchIndex has been deprecated and replaced with Index
SearchIndex is an enum that is useful only for system indices, such as CONTENT
or CHANGE
. Previously, a plugin had to use an index name to manage its own custom index. It would be cumbersome when the search platform is OpenSearch because the enum value would need to be translated into a real name of the index stored on OpenSearch whenever a request is built.
What’s changed: SearchIndex has been deprecated, and is replaced by Index .
When: Confluence 8.7
What’s new: Index can be used for both system indices and custom indices. Using this new class provides some benefits, including:
A consistent way to interact with an index, whether it's a system or custom index, and whether the search platform is Lucene or OpenSearch.
Index name abstraction, which makes code neater, and easier to maintain.
Sorting changes and updates
TextFieldMapping
Lucene allows you to sort search queries on a text field (i.e. TextFieldMapping
); however, these requests were problematic because fields were tokenized and can be text heavy, leading to inaccurate and inefficient results. It’s recommended instead to search on a keyword field (i.e. StringFieldMapping
).
What’s changed: OpenSearch doesn’t allow sorting on text fields, so such operation will now result in the following error:
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead.
When: Confluence 8.9
What’s new: A workaround is to create the textual field as a keyword, i.e. StringFieldMapping
.
UserAttributeSort
Previously, Lucene allowed sort requests by user attributes (created by UserAttributeSort
). This could be resource intensive because user details had to be fetched separately for each document found by search queries before sorting.
What’s changed: The class UserAttributeSort
has been deprecated, and is not supported on OpenSearch.
When: Confluence 8.9
What’s new: There is no workaround at this stage.
LowercaseFieldSort
The class LowercaseFieldSort
allows sorting based on a lower case value of a keyword field.
What’s changed: OpenSearch doesn’t support lowercase sort natively; therefore, to implement LowercaseFieldSort
efficiently, changes have been added to classes StringFieldMapping
and LowercaseFieldSort
.
When: Confluence 8.9
What’s new:
StringFieldMapping
We’ve introduced two new properties onto StringFieldMapping
which are only relevant for OpenSearch. Please note that these changes will only take effect if Confluence is configured to use OpenSearch.
asLowercase
if true:Index phase: no impact.
Search phase: this setting indicates that the field is already stored as lowercase, hence Confluence will use the field as is to sort result.
withLowercase
if true:Index phase: a sub-field will be created to store a lowercase version of the original field.
Search phase: the sub-field will be used instead of the original field for sorting.
LowercaseFieldSort
We’ve introduced a new constructor to the LowercaseFieldSort
class which takes a StringFieldMapping
argument, instead of just field name, in addition to the sort order. This argument will tell Confluence how to handle lowercase sort in OpenSearch on the field, based on how it stores the lowercase value.
For backward compatibility, Confluence will fallback to use a script sort version of OpenSearch in these scenarios:
LowercaseFieldSort
is not constructed withStringFieldMapping
.LowercaseFieldSort
is constructed withStringFieldMapping
, but the field mapping has bothasLowercase
andwithLowercase
as false.
Due to this fallback adversely impacting performance, we recommend that you update existing codes using this sort to construct: LowercaseFieldSort
with appropriate StringFieldMapping
.
For detection purposes, Confluence will output the following warning log whenever this fallback is used:
Using script sort for field: {field_name}. This will significantly impact query performance. Please consider migrating the field to support lowercase sub field for better performance
Result window limit
You can paginate your search window by defining the limit and offset on the ISearch object. The larger these numbers are (limit + offset), the more index documents the search engine will need to shift through, known as the “result window”, which corresponds proportionally with memory utilization. For example, returning the thousandth page of a search (at 20 results per page) would require significantly more resources than returning the first page.
In Lucene, there are currently no limits to this result window.
What’s changed: By default, OpenSearch has a limit of a 10,000 result window, i.e. the amount of results that can be requested on searches. Requests that exceed this limit will be rejected, and you’ll get an error that looks like the following:
Result window is too large, from + size must be less than or equal to: [10000] but was [10010]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
When: Confluence 8.9
What’s new: In order for your search query to work on OpenSearch, you will need to pass a small number as a limit. You’ll also need to restrict the number of pages that users can navigate to; in other words, keep offset small.
If you need to iterate through a large number of data, consider using the SearchManager.scan
method.
Alternatively, if your search is sorted by field(s) (for example, modified
, _id
), OpenSearch also provides a way to paginate your search efficiently using searchAfter (instead of offset), which is unaffected by result window limits. However, this parameter is not supported on Lucene.
Explore OpenSearch documentation on search_after.
Search results next page token change
Previously, when performing a search (SearchManager.search
), the returned SearchResults
object has a method getNextPageSearch
which can be used to retrieve the next page of the search, if it exists. The getNextPageSearch
property was always returned with a token, so that subsequent searches could be performed against the same version of the index used in the initial searched. This ensured consistency between consecutive paginated searches, which is particular useful for iterating through large number of documents (for example, as part of a batch processing).
Generating a token consumes resources, which is wasteful when performed on every search instead of only when it’s needed.
When: Confluence 9.0
What’s changed: Now, getNextPageSearch
will not be returned with a token on OpenSearch. To avoid breaking existing plugins, it’s still returned for Lucene, but this may change in the future. To obtain a token on your next page search, set ISearch.generatesToken()
to true (default: false) when performing the initial search with SearchManager.search
.
If your code relies on this version token to maintain data consistency across paginated searches, make sure you set ISearch.generatesToken()
to true.
Tips
Rebuilding custom indices
When a plugin maintains a custom index, we recommend that you create a listener for the ReIndexRequestEvent
This listener will rebuild the custom index, which ensures that when an admin requests to rebuild the index (via the Content Indexing admin page), all indices will get rebuilt including custom ones within plugins.