[LUCENE-5879] Add auto-prefix terms to block tree terms dict - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.2, 6.0
Component/s: core/codecs
Labels:
None

Lucene Fields:

New

Description

This cool idea to generalize numeric/trie fields came from Adrien:

Today, when we index a numeric field (LongField, etc.) we pre-compute
(via NumericTokenStream) outside of indexer/codec which prefix terms
should be indexed.

But this can be inefficient: you set a static precisionStep, and
always add those prefix terms regardless of how the terms in the field
are actually distributed. Yet typically in real world applications
the terms have a non-random distribution.

So, it should be better if instead the terms dict decides where it
makes sense to insert prefix terms, based on how dense the terms are
in each region of term space.

This way we can speed up query time for both term (e.g. infix
suggester) and numeric ranges, and it should let us use less index
space and get faster range queries.

This would also mean that min/maxTerm for a numeric field would now be
correct, vs today where the externally computed prefix terms are
placed after the full precision terms, causing hairy code like
NumericUtils.getMaxInt/Long. So optos like ~~LUCENE-5860~~ become
feasible.

The terms dict can also do tricks not possible if you must live on top
of its APIs, e.g. to handle the adversary/over-constrained case when a
given prefix has too many terms following it but finer prefixes
have too few (what block tree calls "floor term blocks").

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-5879.patch
01/Apr/15 21:46
327 kB
Michael McCandless
LUCENE-5879.patch
30/Mar/15 22:18
226 kB
Michael McCandless
LUCENE-5879.patch
28/Mar/15 23:47
225 kB
Michael McCandless
LUCENE-5879.patch
18/Mar/15 15:54
220 kB
Michael McCandless
LUCENE-5879.patch
03/Oct/14 09:57
218 kB
Michael McCandless
LUCENE-5879.patch
01/Oct/14 18:23
292 kB
Michael McCandless
LUCENE-5879.patch
30/Sep/14 14:32
281 kB
Michael McCandless
LUCENE-5879.patch
21/Sep/14 11:15
279 kB
Michael McCandless
LUCENE-5879.patch
16/Sep/14 09:28
268 kB
Michael McCandless
LUCENE-5879.patch
11/Sep/14 23:15
244 kB
Michael McCandless
LUCENE-5879.patch
25/Aug/14 21:43
217 kB
Michael McCandless
LUCENE-5879.patch
22/Aug/14 14:36
213 kB
Michael McCandless
LUCENE-5879.patch
11/Aug/14 14:23
149 kB
Michael McCandless
LUCENE-5879.patch
08/Aug/14 09:34
133 kB
Michael McCandless

Issue Links

is blocked by

LUCENE-6367 Can PrefixQuery subclass AutomatonQuery?

Closed

is related to

SOLR-6741 IPv6 Field Type

Open

supercedes

LUCENE-5596 Support for index/search large numeric field

Closed

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 2 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 08/Aug/14 09:30

Updated:: 28/Aug/22 14:13

Resolved:: 07/Apr/15 09:12

Agile

View on Board

Add auto-prefix terms to block tree terms dict

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment