Here is an implementation of a compressed doc ID set. Although it is immutable, only supports sequential access and requires doc IDs to be provided in order at building time, it supports fast iteration (nextDoc), skipping (advance), union and intersection. The detailed format is a bit complex (see javadocs), but the rough idea is to store large sequences of null bytes as a VInt and sequences of non-null bytes as-is. This implementation can compress data as soon as it can find more than 2 consecutive null bytes. Moreover, even incompressible sets only require a few bytes more than FixedBitSet.
I ran a quick benchmark to measure the size (as reported by RamUsageEstimator) depending on the percentage of bits set on a bit set containing 2<sup>23</sup> elements (FixedBitSet requires 1MB) as well as the time required to iterate over all document IDs compared to FixedBitSet:
|Percentage of bits set
||Iteration time/FixedBitSet iteration time
Even in the worst case, memory usage exceeds the memory usage of FixedBitSet by a few bytes, and iteration is 2.5 times slower.
The patch includes the set implementation but it is not used anywhere yet. I was thinking about using it automatically instead of FixedBitSet in CachingWrapperFilter but it looks like ToParentBlockJoinQuery expects to get a FixedBitSet from the cache.