Differences between indexed and non-indexed searches
Surround SCM supports indexed and non-indexed searches for text in files. See Searching for text in files. Indexed searches return results more quickly and include binary files, but you may need to use non-indexed searches depending on your needs.
The following information can help you understand more about indexed versus non-indexed searches.
Non-indexed searches
Non-indexed searches only look for matches in text files using a line-by-line search. Binary files are not searched. When you search, Surround SCM opens each text file and parses it for the exact phrase you entered as the search term. These searches can be slow if the branch or repository you are searching contains thousands of files.
Indexed searches
Indexed searches look for matches in text and binary files. Performing indexed searches requires running the indexing server and turning on indexing for branches that searches are performed on. See Indexing branches for optimized searches.
Indexed searches have two steps:
1. A search engine-style search is performed by the indexing server. The server looks for the individual words you entered as the search term to narrow the list of potential file matches and assigns a relevance value.
2. Each text file included as a result from step 1 is opened and searched for the exact phrase you entered as the search term. This is a line-by-line search similar to a non-indexed search except the search is performed on a smaller subset of files, which makes the operation much faster.
The indexed search results contain binary files that pass step 1 and text files that pass step 2. Binary files contain the search term in close proximity, but may not have an exact phrase match.
Comparison
Support for: | Indexed search | Non-indexed search | More information |
---|---|---|---|
Binary files in search results | Yes | No | File types include Adobe PDF (.pdf), Microsoft Excel (.xls, .xlsx), Microsoft Word (.doc, .docx), Open Document Format (.odf), and Rich Text Format (.rtf). |
Common words, such as ‘for’ and ‘if’, included in the search term | Only in a phrase | Yes | Common words, such as 'for', 'if', 'in', and 'or' are not indexed. Searching for ‘the’ finds results in a non-indexed search, but not in an indexed search. Searching for ‘the executable’ may find results in both types of searches. |
Punctuation-only search term | Only when using wildcards | Yes | Punctuation is not indexed. Searching for '&&' does not return any results in an indexed search unless you use wildcards. |
Relevance value to indicate how closely results match the search term | Yes | No | |
Matching text beyond the first 100,000 lines in a file or first 1,000 characters in a line | Yes. File is included in results with a relevance value, but specific line numbers are not in results. | No | When searching text files, only the first 100,000 lines in each file and 1,000 characters in each line are searched. These limits help prevent Surround SCM from becoming unresponsive when searching large files. |
The following common words are not indexed.
- a, an, and, are, as, at
- be, but, by
- for
- if, in, into, is, it
- no, not
- of, on, or
- such
- that, the, their, then, there, these, they, this, to
- was, will, with