[LUCENE-4227] DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0-BETA, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

This postings format just wraps Lucene40 (on disk) but then at search
time it loads (up front) all terms postings into RAM.

You'd use this if you have insane amounts of RAM and want the fastest
possible search performance. The postings are not compressed: docIds,
positions are stored as straight int[]s.

The terms are stored as a skip list (array of byte[]), but I packed
all terms together into a single long byte[]: I had started as actual
separate byte[] per term but the added pointer deref and loss of
locality was a lot (~2X) slower for terms-dict intensive queries like
FuzzyQuery.

Low frequency postings (docFreq <= 32 by default) store all docs, pos
and offsets into a single int[]. High frequency postings store docs
as int[], freqs as int[], and positions as int[][] parallel arrays.
For skipping I just do a growing binary search.

I also made specialized DirectTermScorer and DirectExactPhraseScorer
for the high freq case that just pull the int[] and iterate
themselves.

All tests pass.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4227.patch
20/Jul/12 14:21
81 kB
Michael McCandless
LUCENE-4227.patch
20/Jul/12 11:43
74 kB
Michael McCandless
LUCENE-4227.patch
16/Jul/12 20:32
91 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Jul/12 20:29

Updated:: 28/Aug/22 13:22

Resolved:: 20/Jul/12 14:47