Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      See attached JUnitTest, self-explanatory

      1. TestFilteredSearch.java
        3 kB
        Eks Dev
      2. lucene-1169.patch
        4 kB
        Michael Busch

        Activity

        Hide
        paul.elschot@xs4all.nl Paul Elschot added a comment -

        Some of the bugs caused by this skipTo() behaviour are hard to catch:
        http://issues.apache.org/bugzilla/show_bug.cgi?id=35823

        Basically the fix was to guard every invocation of skipTo() with a target > doc()
        test when no advancing should be done.

        In the above case I still don't know what the exact cause was, as the last patch
        added this guarding test in more than one place.

        One way to 'fix' this is by adding to the javadoc of skipTo() that the behaviour is
        undefined when target <= doc(), and otherwise the behaviour is the old behaviour.
        Implementations should then define the behaviour when target <= doc().
        This has the advantage that the only way to fix it is by reviewing all the
        skipTo(targetDocId) code when the javadoc does not completely define the behaviour
        of an implementation.

        Another way to go about this is to consider target<=doc() on entry of skipTo a bug,
        and add sth like this:
        assert (notInitialized and (target >= 0) or (target > doc());
        at the entry of each skipTo implementation in the trunk and fix the bugs as they show up.

        For the moment I prefer the latter, it is a bit drastic, but it gets rid of a lot of uncertainty.
        Anyway, when taking it that far, it's another issue.

        Show
        paul.elschot@xs4all.nl Paul Elschot added a comment - Some of the bugs caused by this skipTo() behaviour are hard to catch: http://issues.apache.org/bugzilla/show_bug.cgi?id=35823 Basically the fix was to guard every invocation of skipTo() with a target > doc() test when no advancing should be done. In the above case I still don't know what the exact cause was, as the last patch added this guarding test in more than one place. One way to 'fix' this is by adding to the javadoc of skipTo() that the behaviour is undefined when target <= doc(), and otherwise the behaviour is the old behaviour. Implementations should then define the behaviour when target <= doc(). This has the advantage that the only way to fix it is by reviewing all the skipTo(targetDocId) code when the javadoc does not completely define the behaviour of an implementation. Another way to go about this is to consider target<=doc() on entry of skipTo a bug, and add sth like this: assert (notInitialized and (target >= 0) or (target > doc()); at the entry of each skipTo implementation in the trunk and fix the bugs as they show up. For the moment I prefer the latter, it is a bit drastic, but it gets rid of a lot of uncertainty. Anyway, when taking it that far, it's another issue.
        Hide
        eksdev Eks Dev added a comment -

        Thank for explaining it!

        So we have now classes implementing DocIdSetIterator (OpenBitSetIterator, SortedVIntList...) that are strictly speaking not conforming to the specification for skipTo(). Side-effects we had here are probably local for this issue, but I have somehow bad feeling having different behaving implementations of the same interface. Sounds paranoid, no

        To make things better, new classes in core like eg. OpenBitSet cover the case you described, when we have iterator positioned one before the first one, but they do not comply to other side effects.

        Mainly, invoking iterator.skipTo(anything <= iterator.doc()) should have the same effect as next(), meaning that iterator gets moved not only in iterator.skipTo(iterator.doc()) ...

        to cut to the chase, should we attempt to fix all OpenDocIdSetIterator implementations to comply to these effects, or it will be enough to comment these differences "relaxed skipTo contract"? Current usage of these classes is in Filter related code and is practically replacement for BitSet iteration, therefore "under control". But if we move on using these classes tightly with Scorers I am afraid we could expect "one off" and similar bugs.

        Another option would be to change specification and use this sentinel -1 approach, but honestly, this is way above my head to comment...

        Show
        eksdev Eks Dev added a comment - Thank for explaining it! So we have now classes implementing DocIdSetIterator (OpenBitSetIterator, SortedVIntList...) that are strictly speaking not conforming to the specification for skipTo(). Side-effects we had here are probably local for this issue, but I have somehow bad feeling having different behaving implementations of the same interface. Sounds paranoid, no To make things better, new classes in core like eg. OpenBitSet cover the case you described, when we have iterator positioned one before the first one, but they do not comply to other side effects. Mainly, invoking iterator.skipTo(anything <= iterator.doc()) should have the same effect as next(), meaning that iterator gets moved not only in iterator.skipTo(iterator.doc()) ... to cut to the chase, should we attempt to fix all OpenDocIdSetIterator implementations to comply to these effects, or it will be enough to comment these differences "relaxed skipTo contract"? Current usage of these classes is in Filter related code and is practically replacement for BitSet iteration, therefore "under control". But if we move on using these classes tightly with Scorers I am afraid we could expect "one off" and similar bugs. Another option would be to change specification and use this sentinel -1 approach, but honestly, this is way above my head to comment...
        Hide
        cutting Doug Cutting added a comment -

        > iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract?

        Yes. The reason is that TermDocs#doc() cannot be called when a TermDocs is first created, since it is then positioned before the first entry. One must call next() at least once before first calling doc(). If the TermDocs is empty, then doc() should never be called. Consider the case of passing an empty TermDocs to skipTo(int): the call to next must be made, so that 'false' is returned without ever calling doc().

        There are other ways of doing this, like defining that doc() returns -1 before next() has ever been called, and/or returning Integer.MAX_VALUE after the last document. But, for better or worse, that's not the design that was chosen.

        Show
        cutting Doug Cutting added a comment - > iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract? Yes. The reason is that TermDocs#doc() cannot be called when a TermDocs is first created, since it is then positioned before the first entry. One must call next() at least once before first calling doc(). If the TermDocs is empty, then doc() should never be called. Consider the case of passing an empty TermDocs to skipTo(int): the call to next must be made, so that 'false' is returned without ever calling doc(). There are other ways of doing this, like defining that doc() returns -1 before next() has ever been called, and/or returning Integer.MAX_VALUE after the last document. But, for better or worse, that's not the design that was chosen.
        Hide
        eksdev Eks Dev added a comment -

        Thank you for fixing it in no time But...

        I am getting confused with skipping iterators semantics,

        is this requirement for the other DocIdSetIterators, of only for scorers (should be, I guess)?

        iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract?

        if that is the case, we have another bug in OpenBitSetIterator (border condition)

        //this is the code in javadoc, "official contract"
        boolean simulatedSkipTo(DocIdSetIterator i, int target) throws IOException {
        do

        { if (!i.next()) return false; }

        while (target > i.doc());
        return true;
        }

        public void testOpenBitSetBorderCondition() throws IOException

        { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); i.skipTo(i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", i.skipTo(i.doc())); //NOT OK according to the javadoc }

        public void testOpenBitSetBorderConditionSimulated() throws IOException

        { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); simulatedSkipTo(i, i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", simulatedSkipTo(i, i.doc())); //OK according to the javadoc!! }
        Show
        eksdev Eks Dev added a comment - Thank you for fixing it in no time But... I am getting confused with skipping iterators semantics, is this requirement for the other DocIdSetIterators, of only for scorers (should be, I guess)? iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract? if that is the case, we have another bug in OpenBitSetIterator (border condition) //this is the code in javadoc, "official contract" boolean simulatedSkipTo(DocIdSetIterator i, int target) throws IOException { do { if (!i.next()) return false; } while (target > i.doc()); return true; } public void testOpenBitSetBorderCondition() throws IOException { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); i.skipTo(i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", i.skipTo(i.doc())); //NOT OK according to the javadoc } public void testOpenBitSetBorderConditionSimulated() throws IOException { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); simulatedSkipTo(i, i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", simulatedSkipTo(i, i.doc())); //OK according to the javadoc!! }
        Hide
        michaelbusch Michael Busch added a comment -

        Oops, I missed the commit for that request. Life goes fast sometimes.

        Actually I committed it after I read your request.

        Show
        michaelbusch Michael Busch added a comment - Oops, I missed the commit for that request. Life goes fast sometimes. Actually I committed it after I read your request.
        Hide
        paul.elschot@xs4all.nl Paul Elschot added a comment -

        Oops, I missed the commit for that request. Life goes fast sometimes.

        Show
        paul.elschot@xs4all.nl Paul Elschot added a comment - Oops, I missed the commit for that request. Life goes fast sometimes.
        Hide
        paul.elschot@xs4all.nl Paul Elschot added a comment -

        The patch looks correct to me, I missed the skipTo() precondition.
        I wonder how that passed all the filter tests, but ok.

        Can I request to rename docSetIdIterator to filterDocIdIterator?

        Show
        paul.elschot@xs4all.nl Paul Elschot added a comment - The patch looks correct to me, I missed the skipTo() precondition. I wonder how that passed all the filter tests, but ok. Can I request to rename docSetIdIterator to filterDocIdIterator?
        Hide
        michaelbusch Michael Busch added a comment -

        Committed. Thanks for finding this Eks!

        Show
        michaelbusch Michael Busch added a comment - Committed. Thanks for finding this Eks!
        Hide
        michaelbusch Michael Busch added a comment -

        The problem is that in IndexSearcher#search() scorer.skipTo() is called without checking if the scorer is already at the same doc as the Filter's docIdSetIterator. And scorer.skipTo(scorer.doc()) behaves as scorer.next():

          /** Skips entries to the first beyond the current whose document number is
             * greater than or equal to <i>target</i>. <p>Returns true iff there is such
             * an entry.  <p>Behaves as if written: <pre>
             *   boolean skipTo(int target) {
             *     do {
             *       if (!next())
             *         return false;
             *     } while (target > doc());
             *     return true;
             *   }
             * </pre>
             * Some implementations are considerably more efficient than that.
             */
            public abstract boolean skipTo(int target) throws IOException;
        

        which means that it is possible to miss the current doc (as Eks' testcase shows).

        All tests (including the new one) pass with this patch. I'll commit soon.

        Show
        michaelbusch Michael Busch added a comment - The problem is that in IndexSearcher#search() scorer.skipTo() is called without checking if the scorer is already at the same doc as the Filter's docIdSetIterator. And scorer.skipTo(scorer.doc()) behaves as scorer.next(): /** Skips entries to the first beyond the current whose document number is * greater than or equal to <i>target</i>. <p>Returns true iff there is such * an entry. <p>Behaves as if written: <pre> * boolean skipTo( int target) { * do { * if (!next()) * return false ; * } while (target > doc()); * return true ; * } * </pre> * Some implementations are considerably more efficient than that. */ public abstract boolean skipTo( int target) throws IOException; which means that it is possible to miss the current doc (as Eks' testcase shows). All tests (including the new one) pass with this patch. I'll commit soon.
        Hide
        eksdev Eks Dev added a comment -

        Filter Bug

        Show
        eksdev Eks Dev added a comment - Filter Bug

          People

          • Assignee:
            michaelbusch Michael Busch
            Reporter:
            eksdev Eks Dev
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development