Do WAH8 and PFOR already have an index?
They do, but the index is naive: it is a plain binary search over a subset of the (docID,position) pairs contained in the set. With the first versions of these DocIdSets, I just wanted to guarantee O(log(cardinality)) advance performance.
Block decoding might still be added to EliasFano, which should improve its nextDoc() performance
The main use-case I see for these sets is to be used as filters. So I think advance() performance is more important?
The Elias-Fano code is not tuned yet, so I'm surprised that the Elias-Fano time for nextDoc() is less than a factor two worse than PFOR.
Well, the PFOR doc ID set is not tuned either. But I agree this is a good surprise for the Elias-Fano set. I mean even the WAH8 doc id set should be pretty fast and is still slower than the Elias-Fano set.
Another surprise is that Elias-Fano is best at advance() among the compressed sets for some cases. That means that Long.bitCount() is doing well on the upper bits then.
I'm looking forward for the index.
For bit densities > 1/2 there is clear need for WAH8 and Elias-Fano to be able to encode the inverse set. Could that be done by a common wrapper?
I guess so.