Issue Details (XML | Word | Printable)

Key: LUCENE-1169
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Michael Busch
Reporter: Eks Dev
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Search with Filter does not work!

Created: 07/Feb/08 04:02 PM   Updated: 09/Feb/08 08:03 PM
Return to search
Component/s: Search
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works lucene-1169.patch 2008-02-07 10:38 PM Michael Busch 4 kB
Java Source File Licensed for inclusion in ASF works TestFilteredSearch.java 2008-02-07 04:03 PM Eks Dev 3 kB

Lucene Fields: New
Resolution Date: 07/Feb/08 11:11 PM


 Description  « Hide
See attached JUnitTest, self-explanatory

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Eks Dev added a comment - 07/Feb/08 04:03 PM
Filter Bug

Michael Busch added a comment - 07/Feb/08 10:38 PM
The problem is that in IndexSearcher#search() scorer.skipTo() is called without checking if the scorer is already at the same doc as the Filter's docIdSetIterator. And scorer.skipTo(scorer.doc()) behaves as scorer.next():
/** Skips entries to the first beyond the current whose document number is
     * greater than or equal to <i>target</i>. <p>Returns true iff there is such
     * an entry.  <p>Behaves as if written: <pre>
     *   boolean skipTo(int target) {
     *     do {
     *       if (!next())
     *         return false;
     *     } while (target > doc());
     *     return true;
     *   }
     * </pre>
     * Some implementations are considerably more efficient than that.
     */
    public abstract boolean skipTo(int target) throws IOException;

which means that it is possible to miss the current doc (as Eks' testcase shows).

All tests (including the new one) pass with this patch. I'll commit soon.


Michael Busch added a comment - 07/Feb/08 11:11 PM
Committed. Thanks for finding this Eks!

Paul Elschot added a comment - 07/Feb/08 11:15 PM
The patch looks correct to me, I missed the skipTo() precondition.
I wonder how that passed all the filter tests, but ok.

Can I request to rename docSetIdIterator to filterDocIdIterator?


Paul Elschot added a comment - 07/Feb/08 11:17 PM
Oops, I missed the commit for that request. Life goes fast sometimes.

Michael Busch added a comment - 07/Feb/08 11:39 PM

Oops, I missed the commit for that request. Life goes fast sometimes.

Actually I committed it after I read your request.


Eks Dev added a comment - 08/Feb/08 10:08 AM
Thank you for fixing it in no time But...

I am getting confused with skipping iterators semantics,

is this requirement for the other DocIdSetIterators, of only for scorers (should be, I guess)?

iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract?

if that is the case, we have another bug in OpenBitSetIterator (border condition)

//this is the code in javadoc, "official contract"
boolean simulatedSkipTo(DocIdSetIterator i, int target) throws IOException {
do { if (!i.next()) return false; } while (target > i.doc());
return true;
}

public void testOpenBitSetBorderCondition() throws IOException { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); i.skipTo(i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", i.skipTo(i.doc())); //NOT OK according to the javadoc }

public void testOpenBitSetBorderConditionSimulated() throws IOException { OpenBitSet bs = new OpenBitSet(); bs.set(0); DocIdSetIterator i = bs.iterator(); simulatedSkipTo(i, i.doc()); assertEquals(0, i.doc()); //cool, moved to the first legal position assertFalse("End of Matcher", simulatedSkipTo(i, i.doc())); //OK according to the javadoc!! }


Doug Cutting added a comment - 08/Feb/08 05:33 PM
> iterator.skipTo(iterator.doc()) <=> iterator.next();// is this contract?

Yes. The reason is that TermDocs#doc() cannot be called when a TermDocs is first created, since it is then positioned before the first entry. One must call next() at least once before first calling doc(). If the TermDocs is empty, then doc() should never be called. Consider the case of passing an empty TermDocs to skipTo(int): the call to next must be made, so that 'false' is returned without ever calling doc().

There are other ways of doing this, like defining that doc() returns -1 before next() has ever been called, and/or returning Integer.MAX_VALUE after the last document. But, for better or worse, that's not the design that was chosen.


Eks Dev added a comment - 09/Feb/08 01:27 PM
Thank for explaining it!

So we have now classes implementing DocIdSetIterator (OpenBitSetIterator, SortedVIntList...) that are strictly speaking not conforming to the specification for skipTo(). Side-effects we had here are probably local for this issue, but I have somehow bad feeling having different behaving implementations of the same interface. Sounds paranoid, no

To make things better, new classes in core like eg. OpenBitSet cover the case you described, when we have iterator positioned one before the first one, but they do not comply to other side effects.

Mainly, invoking iterator.skipTo(anything <= iterator.doc()) should have the same effect as next(), meaning that iterator gets moved not only in iterator.skipTo(iterator.doc()) ...

to cut to the chase, should we attempt to fix all OpenDocIdSetIterator implementations to comply to these effects, or it will be enough to comment these differences "relaxed skipTo contract"? Current usage of these classes is in Filter related code and is practically replacement for BitSet iteration, therefore "under control". But if we move on using these classes tightly with Scorers I am afraid we could expect "one off" and similar bugs.

Another option would be to change specification and use this sentinel -1 approach, but honestly, this is way above my head to comment...


Paul Elschot added a comment - 09/Feb/08 08:03 PM
Some of the bugs caused by this skipTo() behaviour are hard to catch:
http://issues.apache.org/bugzilla/show_bug.cgi?id=35823

Basically the fix was to guard every invocation of skipTo() with a target > doc()
test when no advancing should be done.

In the above case I still don't know what the exact cause was, as the last patch
added this guarding test in more than one place.

One way to 'fix' this is by adding to the javadoc of skipTo() that the behaviour is
undefined when target <= doc(), and otherwise the behaviour is the old behaviour.
Implementations should then define the behaviour when target <= doc().
This has the advantage that the only way to fix it is by reviewing all the
skipTo(targetDocId) code when the javadoc does not completely define the behaviour
of an implementation.

Another way to go about this is to consider target<=doc() on entry of skipTo a bug,
and add sth like this:
assert (notInitialized and (target >= 0) or (target > doc());
at the entry of each skipTo implementation in the trunk and fix the bugs as they show up.

For the moment I prefer the latter, it is a bit drastic, but it gets rid of a lot of uncertainty.
Anyway, when taking it that far, it's another issue.