maybe we should have a static utility method to check that so that consumers of this API can opt for a FixedBitSet if their doc set is going to be dense?
We could, but in which class? For example, in CachingWrapperFilter it might be good to save memory, so it could be there.
Also, would the expected size be the only thing to check for? When decoding speed is also important, other DocIdSets might be preferable.
the ceil of the log in base 2 is computed through a loop
numberOfLeadingZeros is indeed better than a loop. We need the Long variant here.
use PackedInts.getMutable to store the low-order bits instead of a raw long
Can PackedInts.getMutable also be used in a codec? Longs are needed for the high bits, see below, and the high and low bits can be conveniently stored next to each other in an index.
shouldn't the iterator's getCost method return efDecoder.numValues instead of efEncoder.numValues?
Maybe we could just support the encoding of monotonically increasing sequences of ints to make things simpler?
I considered a decoder that returns ints but that would require a lot more casting in the decoder.
Decoding the unary encoded high bits is best done on longs, so mixing longs and ints in the encoder is not really an option.
We could pass the actual NO_MORE_VALUES to be used as an argument to the decoder, would that help?
As to why decoding the unary encoded high bits is best done on longs, see Algorithm 2 in "Broadword Implementation of Rank/Select Queries", Sebastiano Vigna, January 30, 2012, http://vigna.di.unimi.it/ftp/papers/Broadword.pdf .
I also have an initial java implementation of that, but it is not used here yet, there are only a few comments in the code here that it might be used. I'll open another issue for broadword bit selection later.