Lucene - Core
  1. Lucene - Core
  2. LUCENE-330

[PATCH] Use filter bits for next() and skipTo() in FilteredQuery

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      This improves performance of FilteredQuery by not calling score()
      on documents that do not pass the filter.
      This passes the current tests for FilteredQuery, but these tests
      have not been adapted/extended.

      1. ASF.LICENSE.NOT.GRANTED--FilteredQuery.java
        8 kB
        Paul Elschot
      2. ASF.LICENSE.NOT.GRANTED--FilteredQuery.java
        9 kB
        Paul Elschot
      3. ASF.LICENSE.NOT.GRANTED--FilteredQuery.java
        9 kB
        Paul Elschot
      4. ASF.LICENSE.NOT.GRANTED--FilteredQuery.java
        10 kB
        Paul Elschot
      5. ASF.LICENSE.NOT.GRANTED--FilteredQueryPatch1.txt
        3 kB
        Paul Elschot
      6. ASF.LICENSE.NOT.GRANTED--IndexSearcherPatch2.txt
        3 kB
        Paul Elschot
      7. ASF.LICENSE.NOT.GRANTED--SkipFilter.java
        1 kB
        Paul Elschot
      8. ASF.LICENSE.NOT.GRANTED--SkipFilter.java
        1 kB
        Paul Elschot

        Activity

        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13910)
        Use filter bits for next() and skipTo() in FilteredQuery

        cvs diff -u, 6 Jan 2005

        Show
        Paul Elschot added a comment - Created an attachment (id=13910) Use filter bits for next() and skipTo() in FilteredQuery cvs diff -u, 6 Jan 2005
        Hide
        Paul Elschot added a comment -

        The patch requires java 1.4 because it uses BitSet.nextSetBit()

        Show
        Paul Elschot added a comment - The patch requires java 1.4 because it uses BitSet.nextSetBit()
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13914)
        Patch to IndexSearcher.java to use FilteredQuery

        This patch actually simplifies IndexSearcher.java .
        Used cvs diff -u against current head, 6 Jan 2004.

        With this patch FilteredQuery is used in the following tests:
        TestDateFilter, TestFilteredQuery, TestRangeFilter and
        TestRemoteSearchable.

        Obtained this list by running all tests with a System.out.println(this);
        in the constructor of FilteredQuery.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=13914) Patch to IndexSearcher.java to use FilteredQuery This patch actually simplifies IndexSearcher.java . Used cvs diff -u against current head, 6 Jan 2004. With this patch FilteredQuery is used in the following tests: TestDateFilter, TestFilteredQuery, TestRangeFilter and TestRemoteSearchable. Obtained this list by running all tests with a System.out.println(this); in the constructor of FilteredQuery. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13939)
        Counterpart of Filter to enable use of SortedVIntList

        Show
        Paul Elschot added a comment - Created an attachment (id=13939) Counterpart of Filter to enable use of SortedVIntList
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13940)
        Reworked FilteredQuery to also use IterFilter

        Using both filter types required introducing some abstract inline superclasses
        for the Weight and the Scorer.
        The inline subclasses for the Scorer are a bit too large for good
        readability.
        All Lucene tests pass.

        This version always converts a BitSet filter to an IterFilter for testing.
        This conversion is done by an calling an the useIterFilter()
        method in the BitSet constructor.
        Leaving out this call provides backward compatible behaviour.

        All tests pass, with and without the call to useIterFilter().
        Some tests from TestRangeFilter even show a reduced
        size for the SortedVIntList when compared to the BitSet.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=13940) Reworked FilteredQuery to also use IterFilter Using both filter types required introducing some abstract inline superclasses for the Weight and the Scorer. The inline subclasses for the Scorer are a bit too large for good readability. All Lucene tests pass. This version always converts a BitSet filter to an IterFilter for testing. This conversion is done by an calling an the useIterFilter() method in the BitSet constructor. Leaving out this call provides backward compatible behaviour. All tests pass, with and without the call to useIterFilter(). Some tests from TestRangeFilter even show a reduced size for the SortedVIntList when compared to the BitSet. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        This 5 Jan 2005 version of FilteredQuery.java could be simplified
        by deriving an intIterator directly from a BitSet.
        In the useIterFilter() method this is done with a SortedVIntList
        in between, but it could easily be done directly.

        Perhaps the intIterator attachment should be renamed to DocNrIter
        or DocNrIterator.

        With the current Filter only a java.lang.BitSet
        can be used as the underlying data structure for the filter.
        IterFilter allows the use of other data structures,
        eg. an int array of document numbers, or a SortedVIntList.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - This 5 Jan 2005 version of FilteredQuery.java could be simplified by deriving an intIterator directly from a BitSet. In the useIterFilter() method this is done with a SortedVIntList in between, but it could easily be done directly. Perhaps the intIterator attachment should be renamed to DocNrIter or DocNrIterator. With the current Filter only a java.lang.BitSet can be used as the underlying data structure for the filter. IterFilter allows the use of other data structures, eg. an int array of document numbers, or a SortedVIntList. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        (In reply to comment #6)
        > This 5 Jan 2005 version of FilteredQuery.java could be simplified

        That should be 8 Jan 2005.

        Show
        Paul Elschot added a comment - (In reply to comment #6) > This 5 Jan 2005 version of FilteredQuery.java could be simplified That should be 8 Jan 2005.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=13944)
        Refactoring of FilteredQuery, 2nd version of 8 Jan 2005.

        FilteredScorer now has separate methods for next() and skipTo()
        for the filter (abstract filterNext() and abstract filterSkipTo())
        and for the query scorer (scorerNext() and scorerSkipTo()).
        The next() and skipTo() methods work like ConjunctionScorer,
        but the filter and query scorer are separately called.
        The filter is always used before the query scorer to minimize I/O.

        The inline subclasses of FilteredScorer for Filter and IterFilter
        only differ for the filterNext() and filterSkipTo() methods and on a helper
        method for explain(), so they now have a readable size.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=13944) Refactoring of FilteredQuery, 2nd version of 8 Jan 2005. FilteredScorer now has separate methods for next() and skipTo() for the filter (abstract filterNext() and abstract filterSkipTo()) and for the query scorer (scorerNext() and scorerSkipTo()). The next() and skipTo() methods work like ConjunctionScorer, but the filter and query scorer are separately called. The filter is always used before the query scorer to minimize I/O. The inline subclasses of FilteredScorer for Filter and IterFilter only differ for the filterNext() and filterSkipTo() methods and on a helper method for explain(), so they now have a readable size. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14023)
        SkipFilter.java

        A query filter in the form of a DocNrSkipper from an IndexReader.

        Show
        Paul Elschot added a comment - Created an attachment (id=14023) SkipFilter.java A query filter in the form of a DocNrSkipper from an IndexReader.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14024)
        SkipFilter.java, 16 Jan 2005

        FilteredQuery.java using skipTo() on the query, and extended
        with a SkipFilter constructor.

        Show
        Paul Elschot added a comment - Created an attachment (id=14024) SkipFilter.java, 16 Jan 2005 FilteredQuery.java using skipTo() on the query, and extended with a SkipFilter constructor.
        Hide
        Paul Elschot added a comment -

        The patch in IndexSearcher is independent of the rest of these
        attachments.

        SkipFilter.java also contains a method to replace the Filter by
        a SkipFilter. This method is for testing only.

        With and without the use of this method, and with the patch to
        IndexSearcher, all current tests pass.
        This could indicate that the combination of BooleanQuery and Filter
        is is missing from the current tests.

        FilteredQuery is also considerably simplified wrt. to the version
        of 8 Jan 2005.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - The patch in IndexSearcher is independent of the rest of these attachments. SkipFilter.java also contains a method to replace the Filter by a SkipFilter. This method is for testing only. With and without the use of this method, and with the patch to IndexSearcher, all current tests pass. This could indicate that the combination of BooleanQuery and Filter is is missing from the current tests. FilteredQuery is also considerably simplified wrt. to the version of 8 Jan 2005. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14214)
        SkipFilter.java, assigned copyright to ASF

        Show
        Paul Elschot added a comment - Created an attachment (id=14214) SkipFilter.java, assigned copyright to ASF
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14215)
        FilteredQuery.java, assigned copyright to ASG, added explain()

        Show
        Paul Elschot added a comment - Created an attachment (id=14215) FilteredQuery.java, assigned copyright to ASG, added explain()
        Hide
        Erik Hatcher added a comment -

        Paul - it is unfortunate that we've let this patch sit for as long as it has. I've just encountered issues with FilteredQuery myself and am looking to apply your patches in hopes they'll address the problem I've encountered with FilteredQuery's nested within a BooleanQuery. There is a comment in some of your code that this doesn't work with BooleanQuery though. Since the code has changed and your patches are no longer easily applied, could you advise on what the latest patches should be and how to go about going from trunk to these patches? Many thanks!

        Show
        Erik Hatcher added a comment - Paul - it is unfortunate that we've let this patch sit for as long as it has. I've just encountered issues with FilteredQuery myself and am looking to apply your patches in hopes they'll address the problem I've encountered with FilteredQuery's nested within a BooleanQuery. There is a comment in some of your code that this doesn't work with BooleanQuery though. Since the code has changed and your patches are no longer easily applied, could you advise on what the latest patches should be and how to go about going from trunk to these patches? Many thanks!
        Hide
        Yonik Seeley added a comment -

        There are multiple related issues here

        • introduction and use of SkipFilter, which returns DocNrSkipper instead of a BitSet.
        • changes to IndexSearcher.search(...,filter) to create and use a FilteredQuery

        Without taking the full plunge into DocNrSkippers yet, the applicable patch to keep FilteredQuery from
        matching filtered documents would be
        http://issues.apache.org/jira/secure/attachment/12312490/FilteredQueryPatch1.txt

        > There is a comment in some of your code that this doesn't work with BooleanQuery though
        The patch I referenced should work. Paul is refering to BooleanScorer which doesn't implement skipTo, as opposed to BooleanScorer2, which does.

        If the 1.4 BooleanScorer isn't used, then FilteredQueryPatch1.txt can be made more efficient by using scorer.skipTo() in conjunction with bitset.nextSetBit() (per Paul's comments).

        Show
        Yonik Seeley added a comment - There are multiple related issues here introduction and use of SkipFilter, which returns DocNrSkipper instead of a BitSet. changes to IndexSearcher.search(...,filter) to create and use a FilteredQuery Without taking the full plunge into DocNrSkippers yet, the applicable patch to keep FilteredQuery from matching filtered documents would be http://issues.apache.org/jira/secure/attachment/12312490/FilteredQueryPatch1.txt > There is a comment in some of your code that this doesn't work with BooleanQuery though The patch I referenced should work. Paul is refering to BooleanScorer which doesn't implement skipTo, as opposed to BooleanScorer2, which does. If the 1.4 BooleanScorer isn't used, then FilteredQueryPatch1.txt can be made more efficient by using scorer.skipTo() in conjunction with bitset.nextSetBit() (per Paul's comments).
        Hide
        Yonik Seeley added a comment -

        > The patch I referenced should work.

        To be more explicit, that patch should work with both BooleanScorer and BooleanScorer2 because it doesn't use skipTo() in next().
        If the optimization was made to use skipTo() in next(), then only BooleanScorer2 would work.

        Show
        Yonik Seeley added a comment - > The patch I referenced should work. To be more explicit, that patch should work with both BooleanScorer and BooleanScorer2 because it doesn't use skipTo() in next(). If the optimization was made to use skipTo() in next(), then only BooleanScorer2 would work.
        Hide
        Erik Hatcher added a comment -

        I manually applied that patch (prior to my first comment actually) as automatically applying didn't work. I just committed another test to TestFilteredQuery, which fails with this patch with this error:

        java.lang.IndexOutOfBoundsException: Not a valid hit number: 0
        at org.apache.lucene.search.Hits.hitDoc(Hits.java:134)
        at org.apache.lucene.search.Hits.id(Hits.java:116)
        at org.apache.lucene.search.TestFilteredQuery.testBoolean(TestFilteredQuery.java:139)

        I'm fairly confident I applied the patch correctly, though I suppose its possible I missed something.

        Here's an inlined version of the diff I have locally of FilteredQuery:

        $ svn diff FilteredQuery.java
        Index: FilteredQuery.java
        ===================================================================
        — FilteredQuery.java (revision 383339)
        +++ FilteredQuery.java (working copy)
        @@ -34,6 +34,7 @@

        • <p>Created: Apr 20, 2004 8:58:29 AM
          *
        • @author Tim Jones
          + * @author Paul Elschot
        • @since 1.4
        • @version $Id$
        • @see CachingWrapperFilter
          @@ -75,22 +76,42 @@
          // return this query
          public Query getQuery() { return FilteredQuery.this; }
        • // return a scorer that overrides the enclosed query's score if
        • // the given hit has been filtered out.
        • public Scorer scorer (IndexReader indexReader) throws IOException {
          + // return a filtering scorer
          + public Scorer scorer (IndexReader indexReader) throws IOException {
          final Scorer scorer = weight.scorer (indexReader);
          final BitSet bitset = filter.bits (indexReader);
          return new Scorer (similarity) {
        • // pass these methods through to the enclosed scorer
        • public boolean next() throws IOException { return scorer.next(); }

          + public boolean next() throws IOException {
          + do

          Unknown macro: {+ if (! scorer.next()) { + return false; + }
          + } while (! bitset.get(scorer.doc()));
          + /* When skipTo() is allowed on scorer it should be used here
          + * in combination with bitset.nextSetBit(...)
          + * See the while loop in skipTo() below.
          + */
          + return true;
          + }
          public int doc() { return scorer.doc(); }
          - public boolean skipTo (int i) throws IOException { return scorer.skipTo(i); }

          - // if the document has been filtered out, set score to 0.0
          - public float score() throws IOException { - return (bitset.get(scorer.doc())) ? scorer.score() : 0.0f; - }
          + public boolean skipTo(int i) throws IOException {
          + if (! scorer.skipTo) { + return false; + }
          + while (! bitset.get(scorer.doc())) {
          + int nextFiltered = bitset.nextSetBit(scorer.doc() + 1);
          + if (nextFiltered == -1) { + return false; + } else if (! scorer.skipTo(nextFiltered)) { + return false; + }+ }

          + return true;
          + }
          +
          + public float score() throws IOException

          { return scorer.score(); }

        // add an explanation about whether the document was filtered
        public Explanation explain (int i) throws IOException {

        What am I missing?

        Show
        Erik Hatcher added a comment - I manually applied that patch (prior to my first comment actually) as automatically applying didn't work. I just committed another test to TestFilteredQuery, which fails with this patch with this error: java.lang.IndexOutOfBoundsException: Not a valid hit number: 0 at org.apache.lucene.search.Hits.hitDoc(Hits.java:134) at org.apache.lucene.search.Hits.id(Hits.java:116) at org.apache.lucene.search.TestFilteredQuery.testBoolean(TestFilteredQuery.java:139) I'm fairly confident I applied the patch correctly, though I suppose its possible I missed something. Here's an inlined version of the diff I have locally of FilteredQuery: $ svn diff FilteredQuery.java Index: FilteredQuery.java =================================================================== — FilteredQuery.java (revision 383339) +++ FilteredQuery.java (working copy) @@ -34,6 +34,7 @@ <p>Created: Apr 20, 2004 8:58:29 AM * @author Tim Jones + * @author Paul Elschot @since 1.4 @version $Id$ @see CachingWrapperFilter @@ -75,22 +76,42 @@ // return this query public Query getQuery() { return FilteredQuery.this; } // return a scorer that overrides the enclosed query's score if // the given hit has been filtered out. public Scorer scorer (IndexReader indexReader) throws IOException { + // return a filtering scorer + public Scorer scorer (IndexReader indexReader) throws IOException { final Scorer scorer = weight.scorer (indexReader); final BitSet bitset = filter.bits (indexReader); return new Scorer (similarity) { // pass these methods through to the enclosed scorer public boolean next() throws IOException { return scorer.next(); } + public boolean next() throws IOException { + do Unknown macro: {+ if (! scorer.next()) { + return false; + } + } while (! bitset.get(scorer.doc())); + /* When skipTo() is allowed on scorer it should be used here + * in combination with bitset.nextSetBit(...) + * See the while loop in skipTo() below. + */ + return true; + } public int doc() { return scorer.doc(); } - public boolean skipTo (int i) throws IOException { return scorer.skipTo(i); } - // if the document has been filtered out, set score to 0.0 - public float score() throws IOException { - return (bitset.get(scorer.doc())) ? scorer.score() : 0.0f; - } + public boolean skipTo(int i) throws IOException { + if (! scorer.skipTo ) { + return false; + } + while (! bitset.get(scorer.doc())) { + int nextFiltered = bitset.nextSetBit(scorer.doc() + 1); + if (nextFiltered == -1) { + return false; + } else if (! scorer.skipTo(nextFiltered)) { + return false; + }+ } + return true; + } + + public float score() throws IOException { return scorer.score(); } // add an explanation about whether the document was filtered public Explanation explain (int i) throws IOException { What am I missing?
        Hide
        Paul Elschot added a comment -

        (Erik
        > I've just encountered issues with FilteredQuery ...

        Could you be more specific?

        > Since the code has changed and your patches are no longer easily applied, could you advise on what the latest patches should be and how to go about going from trunk to these patches?

        My current version is adapted for gcj 4.0.0 (which can/could not deal with inline abstract classes), so I can't
        provide a fresh patch quickly at the moment.

        Yonik identified all issues, nothing to add there.

        The patch contains my name as @author, could that be removed?
        The ASL 2.0 is somewhat inconclusive as to whether that can be done without me providing a
        fresh patch without that line, but I'd like to have that line removed nonetheless.
        If necessary, I'll provide a patch for this once the tests work again.

        I'll try the new TestFilteredQuery later.

        Show
        Paul Elschot added a comment - (Erik > I've just encountered issues with FilteredQuery ... Could you be more specific? > Since the code has changed and your patches are no longer easily applied, could you advise on what the latest patches should be and how to go about going from trunk to these patches? My current version is adapted for gcj 4.0.0 (which can/could not deal with inline abstract classes), so I can't provide a fresh patch quickly at the moment. Yonik identified all issues, nothing to add there. The patch contains my name as @author, could that be removed? The ASL 2.0 is somewhat inconclusive as to whether that can be done without me providing a fresh patch without that line, but I'd like to have that line removed nonetheless. If necessary, I'll provide a patch for this once the tests work again. I'll try the new TestFilteredQuery later.
        Hide
        Erik Hatcher added a comment -

        > Could you be more specific?

        The new TestFilteredQuery shows the details of the failure, with the stack trace in my last comment provided the patch I supplied. Those are all the specifics I have.

        > The patch contains my name as @author, could that be removed?

        Sure, no problem. I simply was true to the patch you provided earlier on in this issue, but I'd be happy to remove it if this patch gets committed.

        Show
        Erik Hatcher added a comment - > Could you be more specific? The new TestFilteredQuery shows the details of the failure, with the stack trace in my last comment provided the patch I supplied. Those are all the specifics I have. > The patch contains my name as @author, could that be removed? Sure, no problem. I simply was true to the patch you provided earlier on in this issue, but I'd be happy to remove it if this patch gets committed.
        Hide
        Erik Hatcher added a comment -

        The patch from FilteredQueryPatch1.txt has been applied and committed. Thanks for the fix, Paul!

        Show
        Erik Hatcher added a comment - The patch from FilteredQueryPatch1.txt has been applied and committed. Thanks for the fix, Paul!
        Hide
        Paul Elschot added a comment -

        This was added to FilteredQuery not so long ago.

        Show
        Paul Elschot added a comment - This was added to FilteredQuery not so long ago.

          People

          • Assignee:
            Unassigned
            Reporter:
            Paul Elschot
          • Votes:
            3 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development