Description
This postings format just wraps Lucene40 (on disk) but then at search
time it loads (up front) all terms postings into RAM.
You'd use this if you have insane amounts of RAM and want the fastest
possible search performance. The postings are not compressed: docIds,
positions are stored as straight int[]s.
The terms are stored as a skip list (array of byte[]), but I packed
all terms together into a single long byte[]: I had started as actual
separate byte[] per term but the added pointer deref and loss of
locality was a lot (~2X) slower for terms-dict intensive queries like
FuzzyQuery.
Low frequency postings (docFreq <= 32 by default) store all docs, pos
and offsets into a single int[]. High frequency postings store docs
as int[], freqs as int[], and positions as int[][] parallel arrays.
For skipping I just do a growing binary search.
I also made specialized DirectTermScorer and DirectExactPhraseScorer
for the high freq case that just pull the int[] and iterate
themselves.
All tests pass.