This postings format just wraps Lucene40 (on disk) but then at search
time it loads (up front) all terms postings into RAM.
You'd use this if you have insane amounts of RAM and want the fastest
possible search performance. The postings are not compressed: docIds,
positions are stored as straight ints.
The terms are stored as a skip list (array of byte), but I packed
all terms together into a single long byte: I had started as actual
separate byte per term but the added pointer deref and loss of
locality was a lot (~2X) slower for terms-dict intensive queries like
Low frequency postings (docFreq <= 32 by default) store all docs, pos
and offsets into a single int. High frequency postings store docs
as int, freqs as int, and positions as int parallel arrays.
For skipping I just do a growing binary search.
I also made specialized DirectTermScorer and DirectExactPhraseScorer
for the high freq case that just pull the int and iterate
All tests pass.
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Fix Version/s||4.0 [ 12322456 ]|
|Fix Version/s||5.0 [ 12321663 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Transition||Time In Source Status||Execution Times||Last Executer||Last Execution Date|
|3d 18h 18m||1||Michael McCandless||20/Jul/12 15:47|
|293d 19h 51m||1||Uwe Schindler||10/May/13 11:39|