Lucene - Core
  1. Lucene - Core
  2. LUCENE-4992

ArrayOutOfBoundsException in BooleanScorer2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.4, Trunk
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Seeing following exception in BooleanScorer2 in our production system:
      Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2147483647
      at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:312)
      at org.apache.lucene.queries.CustomScoreQuery$CustomScorer.score(CustomScoreQuery.java:324)
      at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:84)
      at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:47)
      at org.apache.lucene.search.Scorer.score(Scorer.java:64)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:482)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:438)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

      1. LUCENE-4992.patch
        2 kB
        Robert Muir
      2. patch.diff
        0.7 kB
        John Wang

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        12d 8h 16m 1 Robert Muir 22/May/13 04:38
        Resolved Resolved Closed Closed
        62d 14h 58m 1 Steve Rowe 23/Jul/13 19:37
        Steve Rowe made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.4 [ 12324323 ]
        Resolution Fixed [ 1 ]
        Hide
        Robert Muir added a comment -

        I committed this: thanks for reporting this John!

        Show
        Robert Muir added a comment - I committed this: thanks for reporting this John!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] rmuir
        http://svn.apache.org/viewvc?view=revision&revision=1485041

        LUCENE-4992: CustomScoreQuery does not work with arbitrary queries: scoringQueries must match every document

        Show
        Commit Tag Bot added a comment - [branch_4x commit] rmuir http://svn.apache.org/viewvc?view=revision&revision=1485041 LUCENE-4992 : CustomScoreQuery does not work with arbitrary queries: scoringQueries must match every document
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] rmuir
        http://svn.apache.org/viewvc?view=revision&revision=1485040

        LUCENE-4992: CustomScoreQuery does not work with arbitrary queries: scoringQueries must match every document

        Show
        Commit Tag Bot added a comment - [trunk commit] rmuir http://svn.apache.org/viewvc?view=revision&revision=1485040 LUCENE-4992 : CustomScoreQuery does not work with arbitrary queries: scoringQueries must match every document
        Hide
        John Wang added a comment -

        makes senses, agreed.

        Show
        John Wang added a comment - makes senses, agreed.
        Hide
        Robert Muir added a comment -

        I dont think we should add this check (because only broken consumers would trigger it).

        Maybe an assert is ok though.

        Show
        Robert Muir added a comment - I dont think we should add this check (because only broken consumers would trigger it). Maybe an assert is ok though.
        Hide
        John Wang added a comment -

        Thanks Rob for the patch!
        This makes the api much safer.

        I am inclined to think option 2) above with the NO_MORE_DOCS check is a still good thing to do for BooleanScorer2 (unrelated to CustomScoreQuery).

        What do you think?

        =John

        Show
        John Wang added a comment - Thanks Rob for the patch! This makes the api much safer. I am inclined to think option 2) above with the NO_MORE_DOCS check is a still good thing to do for BooleanScorer2 (unrelated to CustomScoreQuery). What do you think? =John
        Robert Muir made changes -
        Attachment LUCENE-4992.patch [ 12582797 ]
        Hide
        Robert Muir added a comment -

        Here's a patch to change the signature to FunctionQuery to prevent the trap.

        Show
        Robert Muir added a comment - Here's a patch to change the signature to FunctionQuery to prevent the trap.
        Hide
        John Wang added a comment -

        Hey Rob:

        Just verified your suggestion works in our env.

        Feel free to resolve this.

        -john

        Show
        John Wang added a comment - Hey Rob: Just verified your suggestion works in our env. Feel free to resolve this. -john
        Hide
        John Wang added a comment -

        Oh! that's nice!
        We will just do that!

        Thanks!

        Show
        John Wang added a comment - Oh! that's nice! We will just do that! Thanks!
        Hide
        Robert Muir added a comment -

        Another nice thing about the last approach (actually, you can do it today to fix your bug), is that you can always pass an arbitrary query anyway:

        new FunctionQuery(new QueryValueSource(MyBooleanQueryOrWhatever, 5.0f))

        The QueryValueSource has that additional argument that lets you specify what the score (in this example: 5.0f) should be for documents that don't match, so it removes any ambiguity.

        Show
        Robert Muir added a comment - Another nice thing about the last approach (actually, you can do it today to fix your bug), is that you can always pass an arbitrary query anyway: new FunctionQuery(new QueryValueSource(MyBooleanQueryOrWhatever, 5.0f)) The QueryValueSource has that additional argument that lets you specify what the score (in this example: 5.0f) should be for documents that don't match, so it removes any ambiguity.
        Hide
        John Wang added a comment -

        Thanks Rob for following up!
        I think the right thing to do would be the third option, that is the least intrusive while keeping the functionality consistent. And you are absolutely right, by doing the hacks I am suggesting would provide unexpected behavior, which would be worse.
        I will follow up on my end to make sure we would be ok with this change.

        -John

        Show
        John Wang added a comment - Thanks Rob for following up! I think the right thing to do would be the third option, that is the least intrusive while keeping the functionality consistent. And you are absolutely right, by doing the hacks I am suggesting would provide unexpected behavior, which would be worse. I will follow up on my end to make sure we would be ok with this change. -John
        Hide
        Robert Muir added a comment -

        John I see: so I think the whole design of this thing doesnt work today for your use case (where valSrcScorer is BooleanScorer2).

        It seems implicit in this CustomScoreQuery's implementation that the value source scorers will match all documents, yet nothing about the API enforces this, instead it takes arbitrary Query (but won't actually work correctly today with arbitrary queries!):

          public CustomScoreQuery(Query subQuery, Query... scoringQueries) {
        

        I feel like one of three things should happen:
        1. Change behavior of CustomScoreQuery to act more conjunction-like as you suggest. Though this means this query would be doing a significantly different thing than its current javadocs describe.
        2. Keep the behavior of today, except add NO_MORE_DOCS checks. But this can be confusing too, e.g. what would vScores[i] contain for an exhausted valSrcScorer?
        2. Current behavior, but scoringQueries changed to FunctionQuery to be type-safe.

        Any ideas?

        Show
        Robert Muir added a comment - John I see: so I think the whole design of this thing doesnt work today for your use case (where valSrcScorer is BooleanScorer2). It seems implicit in this CustomScoreQuery's implementation that the value source scorers will match all documents, yet nothing about the API enforces this, instead it takes arbitrary Query (but won't actually work correctly today with arbitrary queries!): public CustomScoreQuery(Query subQuery, Query... scoringQueries) { I feel like one of three things should happen: 1. Change behavior of CustomScoreQuery to act more conjunction-like as you suggest. Though this means this query would be doing a significantly different thing than its current javadocs describe. 2. Keep the behavior of today, except add NO_MORE_DOCS checks. But this can be confusing too, e.g. what would vScores [i] contain for an exhausted valSrcScorer? 2. Current behavior, but scoringQueries changed to FunctionQuery to be type-safe. Any ideas?
        Hide
        Robert Muir added a comment -

        I'll review CustomScoreQuery and see if its doing the right thing: its definitely possible its doing the wrong thing. Thanks for reporting this.

        Show
        Robert Muir added a comment - I'll review CustomScoreQuery and see if its doing the right thing: its definitely possible its doing the wrong thing. Thanks for reporting this.
        Hide
        John Wang added a comment -

        Forgot to mention, in my test, the valSrcScorer is an instance of BooleanScorer2.

        -John

        Show
        John Wang added a comment - Forgot to mention, in my test, the valSrcScorer is an instance of BooleanScorer2. -John
        Hide
        John Wang added a comment -

        Rob, I dug into the code a little more, seems like the better fix would be in CustomScoreQuery, line 309:

        valSrcScorer.advance(doc);

        what I am finding is that valSrcScorer impl when advance is called, can set the inner docid to NO_MORE_DOCS but that is not retrieved, and instead the doc returned previously from subQueryScorer is used.

        By changing the line to:
        doc = valSrcScorer.advance(doc);

        also fixes the problem. I am not sure if that is the right thing to do though.

        -John

        Show
        John Wang added a comment - Rob, I dug into the code a little more, seems like the better fix would be in CustomScoreQuery, line 309: valSrcScorer.advance(doc); what I am finding is that valSrcScorer impl when advance is called, can set the inner docid to NO_MORE_DOCS but that is not retrieved, and instead the doc returned previously from subQueryScorer is used. By changing the line to: doc = valSrcScorer.advance(doc); also fixes the problem. I am not sure if that is the right thing to do though. -John
        Hide
        John Wang added a comment -

        re: its not clear that the maybe the problem is calling score() on a scorer that already returned NO_MORE_DOCS (which is undefined)

        I stepped thru my test program and that is exactly what is happening.

        Show
        John Wang added a comment - re: its not clear that the maybe the problem is calling score() on a scorer that already returned NO_MORE_DOCS (which is undefined) I stepped thru my test program and that is exactly what is happening.
        Hide
        John Wang added a comment -

        Hi Rob:

        I do have a test that reproduces this and with the patch it is fixed. (I also made sure all tests pass in lucene)

        Unfortunately this is very difficult to reproduce. I was only able to reproduce it with our in house query builder and data corpus.

        -John

        Show
        John Wang added a comment - Hi Rob: I do have a test that reproduces this and with the patch it is fixed. (I also made sure all tests pass in lucene) Unfortunately this is very difficult to reproduce. I was only able to reproduce it with our in house query builder and data corpus. -John
        Hide
        Robert Muir added a comment -

        Do you have a test John? its not clear that the maybe the problem is calling score() on a scorer that already returned NO_MORE_DOCS (which is undefined)

        Show
        Robert Muir added a comment - Do you have a test John? its not clear that the maybe the problem is calling score() on a scorer that already returned NO_MORE_DOCS (which is undefined)
        John Wang made changes -
        Field Original Value New Value
        Attachment patch.diff [ 12582496 ]
        Hide
        John Wang added a comment -

        in DisjunctionSumScorer, line 96:

        private void afterNext() throws IOException {
        final Scorer sub = subScorers[0];
        doc = sub.docID();
        if (doc == NO_MORE_DOCS)

        { nrMatchers = Integer.MAX_VALUE; // stop looping <--- this is set }

        else

        { score = sub.score(); nrMatchers = 1; countMatches(1); countMatches(2); }

        }

        and in BooleanScorer2, line 167:
        coordinator.nrMatchers += super.nrMatchers;

        and then it breaks on line 312.

        Attached is a patch.

        Show
        John Wang added a comment - in DisjunctionSumScorer, line 96: private void afterNext() throws IOException { final Scorer sub = subScorers [0] ; doc = sub.docID(); if (doc == NO_MORE_DOCS) { nrMatchers = Integer.MAX_VALUE; // stop looping <--- this is set } else { score = sub.score(); nrMatchers = 1; countMatches(1); countMatches(2); } } and in BooleanScorer2, line 167: coordinator.nrMatchers += super.nrMatchers; and then it breaks on line 312. Attached is a patch.
        John Wang created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            John Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development