[LUCENE-5052] bitset codec for off heap filters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: core/codecs
Labels:
- features
- perfomance

Lucene Fields:

New

Description

Colleagues,

When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but it should be fast. The obvious way to handle this is to decode postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as well are expensive.
Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 (what about skiplists? and overall performance?).
Beside of the codec implementation, the trickiest part to me is to design API for this. How we can let the app know that a term query don’t need to be cached in heap, but can be held as an mmaped bitset?

WDYT?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

bitsetcodec.zip
13/Mar/14 14:57
25 kB
Dr Oleg Savrasov
bitsetcodec.zip
14/Nov/13 13:00
15 kB
Yury Pakhomov
LUCENE-5052.patch
26/Feb/14 13:57
20 kB
Nina Gracheva
LUCENE-5052-1.patch
31/Mar/14 15:55
19 kB
Dr Oleg Savrasov

Issue Links

is blocked by

LUCENE-5123 invert the codec postings API

Closed

relates to

LUCENE-5084 EliasFanoDocIdSet

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Mikhail Khludnev

Votes:: 3 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 11/Jun/13 19:00

Updated:: 28/Aug/22 13:47

Resolved:: 18/Mar/15 09:44