Lucene - Core
  1. Lucene - Core
  2. LUCENE-413

[PATCH] BooleanScorer2 ArrayIndexOutOfBoundsException + alternative NearSpans

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      From Erik's post at java-dev:

      >      [java] Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
      >      [java]     at org.apache.lucene.search.BooleanScorer2
      > $Coordinator.coordFactor(BooleanScorer2.java:54)
      >      [java]     at org.apache.lucene.search.BooleanScorer2.score
      > (BooleanScorer2.java:292)
      ...

      and my answer:

      Probably nrMatchers is increased too often in score() by calling score()
      more than once.

      1. ASF.LICENSE.NOT.GRANTED--DisjunctionSumScorerPatch3.txt
        1 kB
        Paul Elschot
      2. ASF.LICENSE.NOT.GRANTED--DisjunctionSumScorerPatch4.txt
        0.6 kB
        Paul Elschot
      3. ASF.LICENSE.NOT.GRANTED--DisjunctionSumScorerTestPatch1.txt
        0.9 kB
        Paul Elschot
      4. ASF.LICENSE.NOT.GRANTED--NearSpansOrdered.java
        7 kB
        Paul Elschot
      5. ASF.LICENSE.NOT.GRANTED--NearSpansOrderedBugHuntPatch1.txt
        1 kB
        Paul Elschot
      6. ASF.LICENSE.NOT.GRANTED--NearSpansUnordered.java
        7 kB
        Paul Elschot
      7. ASF.LICENSE.NOT.GRANTED--SpanNearQueryPatch1.txt
        0.6 kB
        Paul Elschot
      8. ASF.LICENSE.NOT.GRANTED--SpanScorerTestPatch1.txt
        0.9 kB
        Paul Elschot
      9. DisjunctionSumScorerPatch5.txt
        0.4 kB
        Paul Elschot
      10. SpanScorer.java
        3 kB
        Paul Elschot
      11. TestSpansAdvanced.java
        6 kB
        Paul Elschot
      12. TestSpansAdvanced2.java
        4 kB
        Paul Elschot

        Activity

        Hide
        Paul Elschot added a comment -

        Created an attachment (id=15737)
        Patch for BooleanScorer2

        Do not increase nrMatchers more than once for each scored doc.

        Show
        Paul Elschot added a comment - Created an attachment (id=15737) Patch for BooleanScorer2 Do not increase nrMatchers more than once for each scored doc.
        Hide
        Erik Hatcher added a comment -

        Committed, thanks. All test cases still pass.

        Show
        Erik Hatcher added a comment - Committed, thanks. All test cases still pass.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=15759)
        Test version of BooleanScorer2

        Since the last patch did not fix the reported bug,
        this test version of BooleanScorer2.java can be useful
        in getting more information when the array index out of
        bounds exception is thrown.
        It throws the exception earlier (when the index is incremented)
        and provides more state of the BooleanScorer2, ao. the document number.

        This version includes the previous patch, and it applies
        the test of the previous patch also in
        the BooleanScorer2.score() method.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=15759) Test version of BooleanScorer2 Since the last patch did not fix the reported bug, this test version of BooleanScorer2.java can be useful in getting more information when the array index out of bounds exception is thrown. It throws the exception earlier (when the index is incremented) and provides more state of the BooleanScorer2, ao. the document number. This version includes the previous patch, and it applies the test of the previous patch also in the BooleanScorer2.score() method. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=15787)
        Test patch for DisjunctionSumScorer.java

        From the discussion on java-dev a possible source
        of this problem is that DisjunctionSumScorer provides a
        nrMatchers that is bigger than its number of scorers.
        This patches DisjunctionSumScorer to throw an AssertionError
        with extra state info as soon as that happens.

        Unfortunately I still have no idea what the problem is.

        Erik, could you run your test again with this patch?

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=15787) Test patch for DisjunctionSumScorer.java From the discussion on java-dev a possible source of this problem is that DisjunctionSumScorer provides a nrMatchers that is bigger than its number of scorers. This patches DisjunctionSumScorer to throw an AssertionError with extra state info as soon as that happens. Unfortunately I still have no idea what the problem is. Erik, could you run your test again with this patch? Regards, Paul Elschot
        Hide
        Erik Hatcher added a comment -

        I applied the DisjunctionSumScorer patch, and got this output:

        java.lang.AssertionError: nrMatchers: 4 bigger than nrScorers: 3
        minimumNrMatchers: 1, currentDoc: 789, currentScore: 0.08528499
        top: org.apache.lucene.search.spans.SpanScorer@18f51f
        scorerQueue.size(): 3
        at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent
        (DisjunctionSumScorer.java:171)
        at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:224)
        at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345)
        at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56)
        at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288)

        Show
        Erik Hatcher added a comment - I applied the DisjunctionSumScorer patch, and got this output: java.lang.AssertionError: nrMatchers: 4 bigger than nrScorers: 3 minimumNrMatchers: 1, currentDoc: 789, currentScore: 0.08528499 top: org.apache.lucene.search.spans.SpanScorer@18f51f scorerQueue.size(): 3 at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent (DisjunctionSumScorer.java:171) at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:224) at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288)
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=15800)
        Second test patch for DisjunctionSumScorer.java

        In case a scorer does not advance its doc number after next()
        returns true, the originally reported exception can occur.

        This patch throws an AssertionError for the first scorer that
        behaves that way, and shows the scorer and the document numbers.

        Scorers should advance their document number after next() returns true.

        If this exception does not happen, the bug is probably in
        DisjunctionSumScorer.java.

        Erik, could you run your test once more with this patch applied
        against the trunk?

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=15800) Second test patch for DisjunctionSumScorer.java In case a scorer does not advance its doc number after next() returns true, the originally reported exception can occur. This patch throws an AssertionError for the first scorer that behaves that way, and shows the scorer and the document numbers. Scorers should advance their document number after next() returns true. If this exception does not happen, the bug is probably in DisjunctionSumScorer.java. Erik, could you run your test once more with this patch applied against the trunk? Regards, Paul Elschot
        Hide
        Erik Hatcher added a comment -

        I don't see a difference in the output:

        java.lang.AssertionError: nrMatchers: 4 bigger than nrScorers: 3
        minimumNrMatchers: 1, currentDoc: 789, currentScore: 0.08528499
        top: org.apache.lucene.search.spans.SpanScorer@18f51f
        scorerQueue.size(): 3
        at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent
        (DisjunctionSumScorer.java:171)
        at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:224)
        at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345)
        at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56)
        at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
        at org.apache.lucene.search.Hits.<init>(Hits.java:44)
        at org.apache.lucene.search.Searcher.search(Searcher.java:40)
        at org.apache.lucene.search.Searcher.search(Searcher.java:32)

        Show
        Erik Hatcher added a comment - I don't see a difference in the output: java.lang.AssertionError: nrMatchers: 4 bigger than nrScorers: 3 minimumNrMatchers: 1, currentDoc: 789, currentScore: 0.08528499 top: org.apache.lucene.search.spans.SpanScorer@18f51f scorerQueue.size(): 3 at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent (DisjunctionSumScorer.java:171) at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:224) at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.<init>(Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:40) at org.apache.lucene.search.Searcher.search(Searcher.java:32)
        Hide
        Paul Elschot added a comment -

        Since there is no difference in output, the bug must be
        in DisjunctionSumScorer.advanceAfterCurrent().

        I had a good look at that code (the test version after the
        second patch), and I really can't see a way of
        increasing nrMatchers beyond the number of scorers in
        the scorer queue: in the test version, each scorer is
        verified to advance to a higher document number, and
        nrMatchers is only increased when the document number
        of a scorer is the same.

        I'm overlooking something, or at least one scorer
        does not return the same document number on successive
        calls of doc() without a next() in between.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Since there is no difference in output, the bug must be in DisjunctionSumScorer.advanceAfterCurrent(). I had a good look at that code (the test version after the second patch), and I really can't see a way of increasing nrMatchers beyond the number of scorers in the scorer queue: in the test version, each scorer is verified to advance to a higher document number, and nrMatchers is only increased when the document number of a scorer is the same. I'm overlooking something, or at least one scorer does not return the same document number on successive calls of doc() without a next() in between. Regards, Paul Elschot
        Hide
        Erik Hatcher added a comment -

        My apologies - I apparently was not running your patch (my environment did not recompile it even
        though I had patched it). The output is now this:

        java.lang.AssertionError: Scorer: org.apache.lucene.search.spans.SpanScorer@ed9f47
        advanced from doc 112 to doc 112 after next() returned true.
        at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent
        (DisjunctionSumScorer.java:153)
        at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:232)
        at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345)
        at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56)
        at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
        at org.apache.lucene.search.Hits.<init>(Hits.java:44)
        at org.apache.lucene.search.Searcher.search(Searcher.java:40)
        at org.apache.lucene.search.Searcher.search(Searcher.java:32)

        So SpanScorer appears to be the culprit.

        Show
        Erik Hatcher added a comment - My apologies - I apparently was not running your patch (my environment did not recompile it even though I had patched it). The output is now this: java.lang.AssertionError: Scorer: org.apache.lucene.search.spans.SpanScorer@ed9f47 advanced from doc 112 to doc 112 after next() returned true. at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent (DisjunctionSumScorer.java:153) at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:232) at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.<init>(Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:40) at org.apache.lucene.search.Searcher.search(Searcher.java:32) So SpanScorer appears to be the culprit.
        Hide
        Paul Elschot added a comment -

        Thanks for retesting.
        It appears worthwhile to have that last check (producing
        the last output) built into DisjunctionSumScorer.java as an assert.

        Perhaps the author of SpanScorer.java could take a further look
        into this.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Thanks for retesting. It appears worthwhile to have that last check (producing the last output) built into DisjunctionSumScorer.java as an assert. Perhaps the author of SpanScorer.java could take a further look into this. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        I had another look at SpanScorer.next(), and it seems to me it
        is not the source of the problem.
        So I had another look at the query (see java-dev messages of 26 July):

        Reformatted to show the structure:
        +(spanNear([FULLTEXT:cat, FULLTEXT:dog, FULLTEXT:bird], 1, true)  
          spanNear([FULLTEXT:horse, FULLTEXT:cow, FULLTEXT:pig], 1, true)  
          spanNear([FULLTEXT:snake, FULLTEXT:camel], 0, true)
        )
        +(FULLTEXT:zebra
          FULLTEXT:insect
          spanNear([FULLTEXT:feline, FULLTEXT:goat], 0, true)
        )

        This means that the NearSpans provided by a spanNear subquery
        probably is the source of the problem.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - I had another look at SpanScorer.next(), and it seems to me it is not the source of the problem. So I had another look at the query (see java-dev messages of 26 July): Reformatted to show the structure: +(spanNear( [FULLTEXT:cat, FULLTEXT:dog, FULLTEXT:bird] , 1, true)     spanNear( [FULLTEXT:horse, FULLTEXT:cow, FULLTEXT:pig] , 1, true)     spanNear( [FULLTEXT:snake, FULLTEXT:camel] , 0, true) ) +(FULLTEXT:zebra   FULLTEXT:insect   spanNear( [FULLTEXT:feline, FULLTEXT:goat] , 0, true) ) This means that the NearSpans provided by a spanNear subquery probably is the source of the problem. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16069)
        Patch to add an assert to DisjunctionSumScorer.java

        This patch reimplements only the most critical test of the second test patch
        (15800) in the form of an assert.

        I'm using this in common-build.xml for the assert:

        <property name="javac.source" value="1.4"/>
        <property name="javac.target" value="1.4"/>

        and this in the test target:

        <assertions>
        <enable package="org.apache.lucene"/>
        </assertions>

        Show
        Paul Elschot added a comment - Created an attachment (id=16069) Patch to add an assert to DisjunctionSumScorer.java This patch reimplements only the most critical test of the second test patch (15800) in the form of an assert. I'm using this in common-build.xml for the assert: <property name="javac.source" value="1.4"/> <property name="javac.target" value="1.4"/> and this in the test target: <assertions> <enable package="org.apache.lucene"/> </assertions>
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16070)
        NearSpansOrdered.java

        Another implementation of ordered NearSpans.
        This uses only arrays of scorers and no PriorityQueue at all.

        Show
        Paul Elschot added a comment - Created an attachment (id=16070) NearSpansOrdered.java Another implementation of ordered NearSpans. This uses only arrays of scorers and no PriorityQueue at all.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16071)
        Patch SpanNearQuery to use NearSpansOrdered

        With this patch, all tests pass here.

        Show
        Paul Elschot added a comment - Created an attachment (id=16071) Patch SpanNearQuery to use NearSpansOrdered With this patch, all tests pass here.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16072)
        Patch TestSpans.java to include more tests

        This also obsoletes TestSpans.java and NearSpans.java in bug 35664.

        Show
        Paul Elschot added a comment - Created an attachment (id=16072) Patch TestSpans.java to include more tests This also obsoletes TestSpans.java and NearSpans.java in bug 35664.
        Hide
        Paul Elschot added a comment -

        Erik, could you run your test once more with the patch to SpanNearQuery
        applied and NearSpansOrdered.java added?

        In case this NearSpansOrdered works also for your test case
        it would probably be a good idea to simplify the current NearSpans
        to handle only the unordered case.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Erik, could you run your test once more with the patch to SpanNearQuery applied and NearSpansOrdered.java added? In case this NearSpansOrdered works also for your test case it would probably be a good idea to simplify the current NearSpans to handle only the unordered case. Regards, Paul Elschot
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16104)
        NearSpansUnordered.java

        This is NearSpans.java specialised for the unordered case, using
        very much the same data structures.

        It needs the patch of attachment 16071 + a rename of NearSpans to
        NearSpansUnordered in the patched SpanNearQuery.java

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=16104) NearSpansUnordered.java This is NearSpans.java specialised for the unordered case, using very much the same data structures. It needs the patch of attachment 16071 + a rename of NearSpans to NearSpansUnordered in the patched SpanNearQuery.java Regards, Paul Elschot
        Hide
        Erik Hatcher added a comment -

        Paul - thanks for your continued efforts with this issue. I have, I believe, patched everything to match
        all the attachments to this issue and am getting errors on all the new TestSpans.java tests like this:

        [junit] Testcase: testSpanNearOrdered01(org.apache.lucene.search.spans.TestSpans): Caused an
        ERROR
        [junit] subSpans1: 2 after 2
        [junit] java.lang.AssertionError: subSpans1: 2 after 2
        [junit] at org.apache.lucene.search.spans.NearSpansOrdered.advanceAfterOrdered
        (NearSpansOrdered.java:159)
        [junit] at org.apache.lucene.search.spans.NearSpansOrdered.next(NearSpansOrdered.java:95)
        [junit] at org.apache.lucene.search.spans.SpanScorer.next(SpanScorer.java:50)
        [junit] at org.apache.lucene.search.Scorer.score(Scorer.java:47)
        [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102)

        Re: the use of "assert" - I'm not sure we want to require JDK 1.4 just yet. We should discuss this on the
        dev list to be sure. I simply changed to AssertionError instead.

        Perhaps I've mispatched something? If you think that is the case, could you roll up all your changes
        into a single patch that I can apply to a clean trunk checkout of Lucene? Thanks!

        Show
        Erik Hatcher added a comment - Paul - thanks for your continued efforts with this issue. I have, I believe, patched everything to match all the attachments to this issue and am getting errors on all the new TestSpans.java tests like this: [junit] Testcase: testSpanNearOrdered01(org.apache.lucene.search.spans.TestSpans): Caused an ERROR [junit] subSpans1: 2 after 2 [junit] java.lang.AssertionError: subSpans1: 2 after 2 [junit] at org.apache.lucene.search.spans.NearSpansOrdered.advanceAfterOrdered (NearSpansOrdered.java:159) [junit] at org.apache.lucene.search.spans.NearSpansOrdered.next(NearSpansOrdered.java:95) [junit] at org.apache.lucene.search.spans.SpanScorer.next(SpanScorer.java:50) [junit] at org.apache.lucene.search.Scorer.score(Scorer.java:47) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102) Re: the use of "assert" - I'm not sure we want to require JDK 1.4 just yet. We should discuss this on the dev list to be sure. I simply changed to AssertionError instead. Perhaps I've mispatched something? If you think that is the case, could you roll up all your changes into a single patch that I can apply to a clean trunk checkout of Lucene? Thanks!
        Hide
        Paul Elschot added a comment -

        Did you perhaps forget to invert the condition of the assert?


        Paul Elschot

        Show
        Paul Elschot added a comment - Did you perhaps forget to invert the condition of the assert? Paul Elschot
        Hide
        Erik Hatcher added a comment -

        Darn - I wish there was an "undo" on here! Sorry about that. Sure enough, I had messed up the
        conversion from assert to AssertionError. The conversion now is this in DisjunctionSumScorer:

        if (top.doc() <= currentDoc)
        throw new AssertionError("Scorer: " + top
        + "\n advanced from doc " + currentDoc
        + " to doc " + top.doc()
        + " after next() returned true.");

        The patched TestSpans passes. However with my test case I'm getting this:

        java.lang.AssertionError: Scorer: org.apache.lucene.search.spans.SpanScorer@73305c
        advanced from doc 112 to doc 112 after next() returned true.
        at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent
        (DisjunctionSumScorer.java:153)
        at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:220)
        at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345)
        at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56)
        at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102)

        Show
        Erik Hatcher added a comment - Darn - I wish there was an "undo" on here! Sorry about that. Sure enough, I had messed up the conversion from assert to AssertionError. The conversion now is this in DisjunctionSumScorer: if (top.doc() <= currentDoc) throw new AssertionError("Scorer: " + top + "\n advanced from doc " + currentDoc + " to doc " + top.doc() + " after next() returned true."); The patched TestSpans passes. However with my test case I'm getting this: java.lang.AssertionError: Scorer: org.apache.lucene.search.spans.SpanScorer@73305c advanced from doc 112 to doc 112 after next() returned true. at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent (DisjunctionSumScorer.java:153) at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:220) at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102)
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16138)
        Bug hunt patch for SpanScorer.java

        Since the output is the same as the output of 28 July,
        I'm going back to untargeted bug hunting.

        Hopefully the state of the spans dumped in this traceback
        will help us to progress...

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=16138) Bug hunt patch for SpanScorer.java Since the output is the same as the output of 28 July, I'm going back to untargeted bug hunting. Hopefully the state of the spans dumped in this traceback will help us to progress... Regards, Paul Elschot
        Hide
        Erik Hatcher added a comment -

        With the SpanScorer patch applied for additional output, I get this:

        SpanScorer.next() A: spans org.apache.lucene.search.spans.NearSpansOrdered@222b9a advanced from
        112 to 112

        java.lang.AssertionError: SpanScorer.next() A: spans
        org.apache.lucene.search.spans.NearSpansOrdered@222b9aadvanced from 112 to 112 at
        org.apache.lucene.search.spans.SpanScorer.next(SpanScorer.java:60) at
        org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151) at
        org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:220) at
        org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345) at
        org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56) at
        org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51) at
        org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288) at
        org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102) at
        org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.<init>
        (Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:40) at
        org.apache.lucene.search.Searcher.search(Searcher.java:32)

        Show
        Erik Hatcher added a comment - With the SpanScorer patch applied for additional output, I get this: SpanScorer.next() A: spans org.apache.lucene.search.spans.NearSpansOrdered@222b9a advanced from 112 to 112 java.lang.AssertionError: SpanScorer.next() A: spans org.apache.lucene.search.spans.NearSpansOrdered@222b9aadvanced from 112 to 112 at org.apache.lucene.search.spans.SpanScorer.next(SpanScorer.java:60) at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:151) at org.apache.lucene.search.DisjunctionSumScorer.skipTo(DisjunctionSumScorer.java:220) at org.apache.lucene.search.BooleanScorer2.skipTo(BooleanScorer2.java:345) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:56) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:51) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:288) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:102) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.<init> (Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:40) at org.apache.lucene.search.Searcher.search(Searcher.java:32)
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16167)
        Bug hunt patch for NearSpansOrdered.java

        It's strange that the previous output is the same as the earlier one from
        July, because the previous one uses NearSpansOrdered and the one from
        July used NearSpans from svn.
        So I'm beginning to suspect that the bug is even further away, perhaps
        in SpanTermQuery.

        Happy hunting,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=16167) Bug hunt patch for NearSpansOrdered.java It's strange that the previous output is the same as the earlier one from July, because the previous one uses NearSpansOrdered and the one from July used NearSpans from svn. So I'm beginning to suspect that the bug is even further away, perhaps in SpanTermQuery. Happy hunting, Paul Elschot
        Hide
        Erik Hatcher added a comment -

        I have removed a few of the attachments to this issue, as they have already been applied to the latest codebase. This will hopefully make it easier for us to work with the attachments, with less confusion.

        Show
        Erik Hatcher added a comment - I have removed a few of the attachments to this issue, as they have already been applied to the latest codebase. This will hopefully make it easier for us to work with the attachments, with less confusion.
        Hide
        Paul Elschot added a comment -

        Patch for org.apache.lucene.search.DisjunctionSumScorer.java .

        This query was reported by Ramana Jelda on java-user to have the same problem:
        +(
          +(
            spanNear([productName:mp, productName:3], 3, true)
            spanNear([subName:mp, subName:3], 3, true)
           )
         +(productName:player subName:player)
        )

        Looking at the query structure and the one reported earlier, I realized the bug
        could also be in DisjunctionSumScorer.skipTo(), and not in the span query code.

        Now let's hope this does it...

        Show
        Paul Elschot added a comment - Patch for org.apache.lucene.search.DisjunctionSumScorer.java . This query was reported by Ramana Jelda on java-user to have the same problem: +(   +(     spanNear( [productName:mp, productName:3] , 3, true)     spanNear( [subName:mp, subName:3] , 3, true)    )  +(productName:player subName:player) ) Looking at the query structure and the one reported earlier, I realized the bug could also be in DisjunctionSumScorer.skipTo(), and not in the span query code. Now let's hope this does it...
        Hide
        Paul Elschot added a comment -

        The date doesn't show up in the list of the attachments until clicking the "manage" link.
        Today's patch is called DisjunctionSumScorerPatch5.txt .

        Show
        Paul Elschot added a comment - The date doesn't show up in the list of the attachments until clicking the "manage" link. Today's patch is called DisjunctionSumScorerPatch5.txt .
        Hide
        Dallan Quass added a comment -

        I'm seeing this issue as well, but very rarely. It happens when DisjunctionSumScorer.advanceAfterCurrent comes out of the do..while loop with nrMatchers == nrScorers+1. This doesn't happens until about the 85'th document that gets scored for a particular query. I added a hack to set nrMatchers = nrScorers in this case, which fixed the problem for me, but I'd be happy to help you track down the issue since I have a particular database and query that seems to elicit it. Is there something you'd like me to do? (I see the same problem on 1_9_final and 1_9_1.)

        Show
        Dallan Quass added a comment - I'm seeing this issue as well, but very rarely. It happens when DisjunctionSumScorer.advanceAfterCurrent comes out of the do..while loop with nrMatchers == nrScorers+1. This doesn't happens until about the 85'th document that gets scored for a particular query. I added a hack to set nrMatchers = nrScorers in this case, which fixed the problem for me, but I'd be happy to help you track down the issue since I have a particular database and query that seems to elicit it. Is there something you'd like me to do? (I see the same problem on 1_9_final and 1_9_1.)
        Hide
        Dallan Quass added a comment -

        The hack I added didn't work. Although it fixed the problem for one query, I'm now seeing a different query that's giving an array out of bounds exception: this time from SpanScorer.score() (line 72), which is called near the top of DisjunctionSumScorer.advanceAfterCurrent. What's odd is that the array index is MAX_INT (2147483648). Any ideas?

        Show
        Dallan Quass added a comment - The hack I added didn't work. Although it fixed the problem for one query, I'm now seeing a different query that's giving an array out of bounds exception: this time from SpanScorer.score() (line 72), which is called near the top of DisjunctionSumScorer.advanceAfterCurrent. What's odd is that the array index is MAX_INT (2147483648). Any ideas?
        Hide
        Dallan Quass added a comment -

        Sorry for cluttering up the comments here. I wanted to add that the part of the query exhibiting the error in the second case is a boolean query with three SpanTermQuery clauses, all "Occur.SHOULD." If I change the SpanTermQuery clauses to just TermQuery clauses, the problem goes away.

        Show
        Dallan Quass added a comment - Sorry for cluttering up the comments here. I wanted to add that the part of the query exhibiting the error in the second case is a boolean query with three SpanTermQuery clauses, all "Occur.SHOULD." If I change the SpanTermQuery clauses to just TermQuery clauses, the problem goes away.
        Hide
        Paul Elschot added a comment -

        Dallan, could you try this patch: DisjunctionSumScorerPatch5.txt ?

        An index at MAX_INT in SpanScorer.score() is an indication that the
        underlying spans returned MAX_INT from spans.doc(), which should mean
        that spans.doc() was called after spans.next() or spans.skipTo(...) returned
        false.
        That means that either SpanScorer.java is incorrect, or that the Spans used
        by it is incorrect, or that SpanScorer is used incorrectly, eg. by having score()
        called on it after next() or skipTo(...) returned false.

        Show
        Paul Elschot added a comment - Dallan, could you try this patch: DisjunctionSumScorerPatch5.txt ? An index at MAX_INT in SpanScorer.score() is an indication that the underlying spans returned MAX_INT from spans.doc(), which should mean that spans.doc() was called after spans.next() or spans.skipTo(...) returned false. That means that either SpanScorer.java is incorrect, or that the Spans used by it is incorrect, or that SpanScorer is used incorrectly, eg. by having score() called on it after next() or skipTo(...) returned false.
        Hide
        Paul Elschot added a comment -

        I think I found a bug in SpanScorer.java.

        The problem is that SpanScorer.skipTo() and SpanScorer.next() are not
        designed to be used both in the same search.
        As the doc of the scored Spans is normally ahead, the attached version of
        SpanScorer adds a check in skipTo() for the scored spans not being ahead
        of the target of skipTo().
        The attached version of SpanScorer is also refactored to remove double code.

        This new SpanScorer changes the behaviour of SpanQueries and it leaves
        TestSpansAdvanced and TestSpansAdvanced2 failing. I attached a passing version for both of these tests, but they still contain some System.out.println()'s,
        and the test for the score value is changed to a warning. The same documents
        still match in the tests, only some score values change.

        I don't know whether DisjunctionSumScorerPatch5.txt is still needed, but it should not hurt.

        I didn't change the copyright year in the sources, in case they make it
        to the trunk, the year could be updated.

        Show
        Paul Elschot added a comment - I think I found a bug in SpanScorer.java. The problem is that SpanScorer.skipTo() and SpanScorer.next() are not designed to be used both in the same search. As the doc of the scored Spans is normally ahead, the attached version of SpanScorer adds a check in skipTo() for the scored spans not being ahead of the target of skipTo(). The attached version of SpanScorer is also refactored to remove double code. This new SpanScorer changes the behaviour of SpanQueries and it leaves TestSpansAdvanced and TestSpansAdvanced2 failing. I attached a passing version for both of these tests, but they still contain some System.out.println()'s, and the test for the score value is changed to a warning. The same documents still match in the tests, only some score values change. I don't know whether DisjunctionSumScorerPatch5.txt is still needed, but it should not hurt. I didn't change the copyright year in the sources, in case they make it to the trunk, the year could be updated.
        Hide
        Yonik Seeley added a comment -

        > The same documents still match in the tests, only some score values change.

        Could you elaborate on this point? Are the old or new scores correct?

        > I don't know whether DisjunctionSumScorerPatch5.txt is still needed, but it should not hurt.

        I was going to commit that one anyway... it looks like a bug if skipTo(target) is called with target<= currentDoc. Most scorers wouldn't do that, so I don't know if it's ever manifested as an error.

        Show
        Yonik Seeley added a comment - > The same documents still match in the tests, only some score values change. Could you elaborate on this point? Are the old or new scores correct? > I don't know whether DisjunctionSumScorerPatch5.txt is still needed, but it should not hurt. I was going to commit that one anyway... it looks like a bug if skipTo(target) is called with target<= currentDoc. Most scorers wouldn't do that, so I don't know if it's ever manifested as an error.
        Hide
        Paul Elschot added a comment -

        >> The same documents still match in the tests, only some score values change.
        > Could you elaborate on this point? Are the old or new scores correct?

        From a quick look the new scorer look better than the old ones. In one of the test
        docs the test query matches 3 times, and it now scores higher than the doc with the single match
        (the one in which only docs D and A match).

        > > I don't know whether DisjunctionSumScorerPatch5.txt is still needed, but it should not hurt.
        > I was going to commit that one anyway... it looks like a bug if skipTo(target) is called with target<= currentDoc.
        > Most scorers wouldn't do that, so I don't know if it's ever manifested as an error.

        The two changes (in DisjunctionSumScorerPatch5.txt and in the posted SpanScorer) are different in that
        for DisjunctionSumScorer the test is against the current doc of the scorer and in
        SpanScorer the test is against the current doc of the subscorer.
        So I think in DisjunctionSumScorer the test may not be needed, but in SpanScorer it should be there.

        However, I don't have a test case showing the out of bounds problem, so I'd like to know
        whether this problem persists with the posted SpanScorer.

        Show
        Paul Elschot added a comment - >> The same documents still match in the tests, only some score values change. > Could you elaborate on this point? Are the old or new scores correct? From a quick look the new scorer look better than the old ones. In one of the test docs the test query matches 3 times, and it now scores higher than the doc with the single match (the one in which only docs D and A match). > > I don't know whether DisjunctionSumScorerPatch5.txt is still needed, but it should not hurt. > I was going to commit that one anyway... it looks like a bug if skipTo(target) is called with target<= currentDoc. > Most scorers wouldn't do that, so I don't know if it's ever manifested as an error. The two changes (in DisjunctionSumScorerPatch5.txt and in the posted SpanScorer) are different in that for DisjunctionSumScorer the test is against the current doc of the scorer and in SpanScorer the test is against the current doc of the subscorer. So I think in DisjunctionSumScorer the test may not be needed, but in SpanScorer it should be there. However, I don't have a test case showing the out of bounds problem, so I'd like to know whether this problem persists with the posted SpanScorer.
        Hide
        Dallan Quass added a comment -

        That fixed it! I made the patch in DisjunctionSumScorerPath5.txt and used the posted SpanScorer, and I'm no longer experiencing the array index out of bounds problem. Thank-you Paul!

        Show
        Dallan Quass added a comment - That fixed it! I made the patch in DisjunctionSumScorerPath5.txt and used the posted SpanScorer, and I'm no longer experiencing the array index out of bounds problem. Thank-you Paul!
        Hide
        Yonik Seeley added a comment -

        Excellent news!

        I'll do a quick review of the changes to SpanScorer, clean up the tests, and commit (just so no one is duplicating the effort).

        Thanks Paul!

        Show
        Yonik Seeley added a comment - Excellent news! I'll do a quick review of the changes to SpanScorer, clean up the tests, and commit (just so no one is duplicating the effort). Thanks Paul!
        Hide
        Yonik Seeley added a comment -

        Committed!
        I also did some changing & refactoring on TestSpansAdvanced/TestSpansAdvanced2, added checking of explain() value, etc. All the tests now pass w/o warnings.

        Show
        Yonik Seeley added a comment - Committed! I also did some changing & refactoring on TestSpansAdvanced/TestSpansAdvanced2, added checking of explain() value, etc. All the tests now pass w/o warnings.

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Paul Elschot
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development