[LUCENE-5159] compressed diskdv sorted/sortedset termdictionaries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.5, 6.0
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

Sorted/SortedSet give you ordinal(s) per document, but them separately have a "term dictionary" of all the values.

You can do a few operations on these:

ord -> term lookup (e.g. retrieving facet labels)
term -> ord lookup (reverse lookup: e.g. fieldcacherangefilter)
get a term enumerator (e.g. merging, ordinalmap construction)

The current implementation for diskdv was the simplest thing that can possibly work: under the hood it just makes a binary DV for these (treating ordinals as document ids). When the terms are fixed length, you can address a term directly with multiplication. When they are variable length though, we have to store a packed ints structure in RAM.

This variable length case is overkill and chews up a lot of RAM if you have many unique values. It also chews up a lot of disk since all the values are just concatenated (no sharing).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-5159.patch
09/Aug/13 23:15
31 kB
Robert Muir
LUCENE-5159.patch
08/Aug/13 17:52
24 kB
Robert Muir
LUCENE-5159.patch
08/Aug/13 17:25
24 kB
Robert Muir

Activity

People

Assignee:: Unassigned

Reporter:: Robert Muir

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Aug/13 17:23

Updated:: 28/Aug/22 13:51

Resolved:: 10/Aug/13 01:16