[LUCENE-3003] Move UnInvertedField into Lucene core - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0-ALPHA
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

Solr's UnInvertedField lets you quickly lookup all terms ords for a
given doc/field.

Like, FieldCache, it inverts the index to produce this, and creates a
RAM-resident data structure holding the bits; but, unlike FieldCache,
it can handle multiple values per doc, and, it does not hold the term
bytes in RAM. Rather, it holds only term ords, and then uses
TermsEnum to resolve ord -> term.

This is great eg for faceting, where you want to use int ords for all
of your counting, and then only at the end you need to resolve the
"top N" ords to their text.

I think this is a useful core functionality, and we should move most
of it into Lucene's core. It's a good complement to FieldCache. For
this first baby step, I just move it into core and refactor Solr's
usage of it.

After this, as separate issues, I think there are some things we could
explore/improve:

The first-pass that allocates lots of tiny byte[] looks like it
could be inefficient. Maybe we could use the byte slices from the
indexer for this...

We can improve the RAM efficiency of the TermIndex: if the codec
supports ords, and we are operating on one segment, we should just
use it. If not, we can use a more RAM-efficient data structure,
eg an FST mapping to the ord.

We may be able to improve on the main byte[] representation by
using packed ints instead of delta-vInt?

Eventually we should fold this ability into docvalues, ie we'd
write the byte[] image at indexing time, and then loading would be
fast, instead of uninverting

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

byte_size_32-bit-openjdk6.txt
01/Apr/11 02:37
3 kB
Mark Miller
LUCENE-3003.patch
30/Mar/11 23:20
88 kB
Michael McCandless
LUCENE-3003.patch
29/Mar/11 19:50
77 kB
Michael McCandless

Issue Links

breaks

SOLR-3150 NPE when facetting using facet.prefix on an "empty" field

Closed

SOLR-3427 Faceting under some conditions throws NPE

Closed

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Mar/11 19:48

Updated:: 28/Aug/22 12:43

Resolved:: 06/Mar/12 11:11