[SOLR-1321] Support for efficient leading wildcards search - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4
Fix Version/s: 1.4
Component/s: Schema and Analysis
Labels:
None

Description

This patch is an implementation of the "reversed tokens" strategy for efficient leading wildcards queries.

ReversedWildcardsTokenFilter reverses tokens and returns both the original token (optional) and the reversed token (with positionIncrement == 0). Reversed tokens are prepended with a marker character to avoid collisions between legitimate tokens and the reversed tokens - e.g. "DNA" would become "and", thus colliding with the regular term "and", but with the marker character it becomes "\u0001and".

This TokenFilter can be added to the analyzer chain that it used during indexing.

SolrQueryParser has been modified to detect the presence of such fields in the current schema, and treat them in a special way. First, SolrQueryParser examines the schema and collects a map of fields where these reversed tokens are indexed. If there is at least one such field, it also sets QueryParser.setAllowLeadingWildcards(true). When building a wildcard query (in getWildcardQuery) the term text may be optionally reversed to put wildcards further along the term text. This happens when the field uses the reversing filter during indexing (as detected above), AND if the wildcard characters are either at 0-th or 1-st position in the term. Otherwise the term text is processed as before, i.e. turned into a regular wildcard query.

Unit tests are provided to test the TokenFilter and the query parsing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

wildcards.patch
31/Jul/09 15:05
15 kB
Andrzej Bialecki
wildcards-2.patch
03/Aug/09 22:07
19 kB
Andrzej Bialecki
wildcards-3.patch
09/Sep/09 22:11
20 kB
Andrzej Bialecki
SOLR-1321.patch
10/Sep/09 14:18
22 kB
Grant Ingersoll
SOLR-1321.patch
10/Sep/09 22:42
23 kB
Grant Ingersoll
SOLR-1321.patch
11/Sep/09 04:17
23 kB
Robert Muir

Issue Links

breaks

SOLR-9900 ReversedWildcardFilterFactory yields false positive hits for range query

Resolved

is related to

SOLR-7466 Allow optional leading wildcards in complexphrase

Resolved

Activity

People

Assignee:: Grant Ingersoll

Reporter:: Andrzej Bialecki

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 31/Jul/09 15:04

Updated:: 28/Dec/16 20:38

Resolved:: 11/Sep/09 13:50