I tried to come up with a sensible performance test to determine a good criterium to choose between OpenBitSet and SortedVIntList as the DocIdSet supporting data structure to be cached.
There is a criterium for this in the patch in docIdSetToCache() method of CachingWrapperFilter, but it's only based on byte size, and it favours SortedVIntList when it is defenitely more compact than OpenBitSet.
The current criterium is to use (cardinality (=nr bits set in OpenBitSet) < maxDocs/9) as a test to prefer SortedVIntList over OpenBitSet for caching. The constant 9 might be replaced by a configuration parameter to allow easy performance experiments there. It could be that a larger value than 9 is turns out to be "optimal" in runtime.
In some cases OpenBitSet can be faster on skipTo(int docNum) than SortedVIntList, even when SortedVIntList is more compact. As Filters can be expected to use skipTo() heavily, this could be important for performance.
Even even though it might be possible to measure the skipTo() performance directly, the effect of the more compact cached data structure of SortedVIntList on garbage collection is (pretty close to) impossible to measure in a simple test case.
Eks Dev had some interesting results there in the very early stages of
LUCENE-584 (September 2006), so I wonder whether these results could be confirmed somehow using the patch here and the current trunk.