I was about to post an updated patch when I realized that now OBS.length() is broken. It currently returns bits.length << 6, which is completely unrelated to neither wlen nor numBits. Bits.length() javadocs say: "Returns the number of bits in this set", what should we return?
- If we want to adhere to the jdocs, we should return numBits, but then numBits cannot be an assertion-only member.
- If we want to return wlen << 6, then we should nuke numBits because otherwise you will call length() to receive e.g 64 but when you will call fastSet(52) you may trip the assertion error.
- If we return bits.length, then we can nuke both wlen and numBits.
Actually, OpenBitSetIterator is also broken, because it iterates up to wlen, again with no relation to numBits. And of course it's now unrelated to bits.length too, so e.g. if you call OBS.length() and compares that with the number of nextDoc() calls the iterator returns (say when all bits are 'set'), the numbers are not equal...
Why do we need these two fields? As far as I can tell, wlen is only < bits.length when someone shares a long but limit its size. Do we really care about this case? I can understand the reason to keep numBits because e.g. you want to adhere to Bits spec:
- Return numBits from length()
- Bits.get(int) document that you should not call it with out of bounds indexes
- Iterator should iterate up to the last set/cleared bit
But it has to be a first-class member, not an assert-only one.
Another quirkiness, what should the app expect to happen if it calls obs.get(170) (and say bits.length=1)? It currently receives false and not e.g. AIOOBE. But this could falsly lead to calling fastSet(170) thinking that the bitset is at the right size. I'm not saying this is a super-duper usecase we should handle, since if an app calls fastSet, it should also call fastGet, but I think it will be good if we are consistent about when you get false and when you hit out-of-bounds.
The out-of-bounds here are also related to 'wlen', as the method checks if the bit is within bits.length range, not wlen range. This is another bug I think since if you share a bits which has say bit 170 set, but you limit it to 2 words, calling OBS.get(170) will false return true?
I think this class could use some serious overhauling and re-thinking. I find it very confusing as it is written now. We should decide what are the contracts of each method, and then implement it as such.
And though unrelated to the bugs reported in this issue, perhaps someone can explain why we need both the 'int' and 'long' method variants? Can't we have only the long indexes?