|
[
Permlink
| « Hide
]
Eks Dev added a comment - 07/Feb/08 04:03 PM
Filter Bug
The problem is that in IndexSearcher#search() scorer.skipTo() is called without checking if the scorer is already at the same doc as the Filter's docIdSetIterator. And scorer.skipTo(scorer.doc()) behaves as scorer.next():
/** Skips entries to the first beyond the current whose document number is
* greater than or equal to <i>target</i>. <p>Returns true iff there is such
* an entry. <p>Behaves as if written: <pre>
* boolean skipTo(int target) {
* do {
* if (!next())
* return false;
* } while (target > doc());
* return true;
* }
* </pre>
* Some implementations are considerably more efficient than that.
*/
public abstract boolean skipTo(int target) throws IOException;
which means that it is possible to miss the current doc (as Eks' testcase shows). All tests (including the new one) pass with this patch. I'll commit soon. Committed. Thanks for finding this Eks!
The patch looks correct to me, I missed the skipTo() precondition.
I wonder how that passed all the filter tests, but ok. Can I request to rename docSetIdIterator to filterDocIdIterator? Oops, I missed the commit for that request. Life goes fast sometimes.
Actually I committed it after I read your request. Thank you for fixing it in no time
I am getting confused with skipping iterators semantics, is this requirement for the other DocIdSetIterators, of only for scorers (should be, I guess)? iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract? if that is the case, we have another bug in OpenBitSetIterator (border condition) //this is the code in javadoc, "official contract" public void testOpenBitSetBorderCondition() throws IOException { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); i.skipTo(i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", i.skipTo(i.doc())); //NOT OK according to the javadoc } public void testOpenBitSetBorderConditionSimulated() throws IOException { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); simulatedSkipTo(i, i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", simulatedSkipTo(i, i.doc())); //OK according to the javadoc!! } > iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract?
Yes. The reason is that TermDocs#doc() cannot be called when a TermDocs is first created, since it is then positioned before the first entry. One must call next() at least once before first calling doc(). If the TermDocs is empty, then doc() should never be called. Consider the case of passing an empty TermDocs to skipTo(int): the call to next must be made, so that 'false' is returned without ever calling doc(). There are other ways of doing this, like defining that doc() returns -1 before next() has ever been called, and/or returning Integer.MAX_VALUE after the last document. But, for better or worse, that's not the design that was chosen. Thank for explaining it!
So we have now classes implementing DocIdSetIterator (OpenBitSetIterator, SortedVIntList...) that are strictly speaking not conforming to the specification for skipTo(). Side-effects we had here are probably local for this issue, but I have somehow bad feeling having different behaving implementations of the same interface. Sounds paranoid, no To make things better, new classes in core like eg. OpenBitSet cover the case you described, when we have iterator positioned one before the first one, but they do not comply to other side effects. Mainly, invoking iterator.skipTo(anything <= iterator.doc()) should have the same effect as next(), meaning that iterator gets moved not only in iterator.skipTo(iterator.doc()) ... to cut to the chase, should we attempt to fix all OpenDocIdSetIterator implementations to comply to these effects, or it will be enough to comment these differences "relaxed skipTo contract"? Current usage of these classes is in Filter related code and is practically replacement for BitSet iteration, therefore "under control". But if we move on using these classes tightly with Scorers I am afraid we could expect "one off" and similar bugs. Another option would be to change specification and use this sentinel -1 approach, but honestly, this is way above my head to comment... Some of the bugs caused by this skipTo() behaviour are hard to catch:
http://issues.apache.org/bugzilla/show_bug.cgi?id=35823 Basically the fix was to guard every invocation of skipTo() with a target > doc() In the above case I still don't know what the exact cause was, as the last patch One way to 'fix' this is by adding to the javadoc of skipTo() that the behaviour is Another way to go about this is to consider target<=doc() on entry of skipTo a bug, For the moment I prefer the latter, it is a bit drastic, but it gets rid of a lot of uncertainty. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||