[LUCENE-6030] Add norms patched compression which uses table for most common values - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.0, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

We have added the PATCHED norms sub format in lucene 50, which uses a bitset to mark documents that have the most common value (when >97% of the documents have that value). This works well for fields that have a predominant value length, and then a small number of docs with some other random values. But another common case is having a handful of very common value lengths, like with a title field.

We can use a table (see TABLE_COMPRESSION) to store the most common values, and save an oridinal for the "other" case, at which point we can lookup in the secondary patch table.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6030.patch
28/Oct/14 18:38
33 kB
Ryan Ernst

Activity

People

Assignee:: Ryan Ernst

Reporter:: Ryan Ernst

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 28/Oct/14 18:20

Updated:: 28/Aug/22 14:18

Resolved:: 31/Oct/14 20:25