Search does not return expected results
Problem
When performing a search in Fisheye the results returned are not what is expected.
Example replication steps
Commit code to a repo such as the following:
@SomeClass(Arg, Arg.Scanning.C, Arg2Scanned)
Commit more code to the repo that build off the previous revision:
[@SomeClass(Arg, Arg.Scanning.C, Arg2Scanned)]
- Use
@
in Quick search to look for all repos - Use Simple search within the repo and search for
@
- Use Simple search within the repo and search for
[@
Results
- Quick search returns both revisions
- Simple search for
@
returns no content - Simple search for
[@
returns only the second revision
Explanation
Fisheye/Crucible during indexing process creates special tokens, that are used in search. Tokens for content of files usually represents single words. Basically in order to find anything in text, user has to type whole word in search area.
To find a new token, each single character is checked:
- each character is put into one of three groups:
- standard - it contains all standard letters and digits
- operators - these are chars, that can have special meaning (currently it is a set of:
+ - * / % \ = < > ! ~ . & & | ^ [ ] _ @
) - pauses - all characters that separate different words - spaces, commas, semicolons, brackets etc.
- each token contains characters from the same group
- if next character is from another group, search ends and token is created
- characters from pause group are omitted
When we consider @SomeClass(Arg, Arg.Scanning.C, Arg2Scanned) it will be divided into tokens: '@
', 'SomeClass', 'Arg', '.', 'Scanning', '.', 'C', 'Arg2Scanned'.
If we add '[
' at the beginning, the first token will look like '[@
', because both characters are operators and characters from the same group are merged into one token.
All tokens have to be found in the content in order to display it. So if user types only '@
', content which contains '[@
' will not be displayed. If user types '[@
', content which has only '@
' will not be displayed as well.