[LUCENE-6863] Store sparse doc values more efficiently - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.4, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

For both NUMERIC fields and ordinals of SORTED fields, we store data in a dense way. As a consequence, if you have only 1000 documents out of 1B that have a value, and 8 bits are required to store those 1000 numbers, we will not require 1KB of storage, but 1GB.

I suspect this mostly happens in abuse cases, but still it's a pity that we explode storage requirements. We could try to detect sparsity and compress accordingly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6863.patch
02/Nov/15 17:45
38 kB
Adrien Grand
LUCENE-6863.patch
29/Oct/15 16:47
35 kB
Adrien Grand
LUCENE-6863.patch
28/Oct/15 22:25
33 kB
Adrien Grand

Issue Links

supercedes

LUCENE-5688 NumericDocValues fields with sparse data can be compressed better

Resolved

LUCENE-4921 Create a DocValuesFormat for sparse doc values

Closed

Activity

People

Assignee:: Adrien Grand

Reporter:: Adrien Grand

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 28/Oct/15 22:15

Updated:: 28/Aug/22 14:44

Resolved:: 06/Nov/15 15:32