[LUCENE-2962] Skip data should be inlined into the postings lists - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: core/index
Labels:
- gsoc2014

Lucene Fields:

New

Description

Today, we store all skip data as a separate blob at the end of a given term's postings (if that term occurs in enough docs to warrant skip data).

But this adds overhead during decoding – we have to seek to a different place for the initial load, we have to init separate readers, we have to seek again while using the lower levels of the skip data, etc. Also, we have to fully decode all skip information even if we are not going to use it (eg if I only want docIDs, I still must decode position offset and lastPayloadLength).

If instead we interleaved skip data into the postings file, we could keep it local, and "private" to each file that needs skipping. This should make it least costly to init and then use the skip data, which'd be a good perf gain for eg PhraseQuery, AndQuery.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

proposal.txt
21/Apr/13 12:49
6 kB
Han Jiang

Activity

People

Assignee:: Unassigned

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 10/Mar/11 16:58

Updated:: 28/Aug/22 12:42