Lucene - Core
  1. Lucene - Core
  2. LUCENE-912

DisjunctionMaxScorer.skipTo has bug that keeps it from skipping

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.1
    • Fix Version/s: 2.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      as reported on the mailing list, DisjunctionMaxScorer.skipTo is broken if called before next in some situations...

      http://www.nabble.com/Potential-issue-with-DisjunctionMaxScorer-tf3846366.html#a10894987

      1. lucene-912.patch
        1 kB
        Michael Busch
      2. dismax_skipto.patch
        2 kB
        Hoss Man
      3. checkTwoCallsToScore.patch
        4 kB
        Doron Cohen
      4. checkTwoCallsToScore.patch
        4 kB
        Doron Cohen

        Issue Links

          Activity

          Hide
          Doron Cohen added a comment -

          I think you're right, and actually wondered about the same thing while verifying that fix: "why return in next() but not in skipTo()?" - I believe this is the right thing to do because at the initialization (ie in add(Scorer)) each added sub-scorer is advanced to its first match. So the first time that next() is called, sub-scorers are already "on the right spot", and not so for the first time skipTo() is called, because skipTo does not necessarily goes to the first match.

          Show
          Doron Cohen added a comment - I think you're right, and actually wondered about the same thing while verifying that fix: "why return in next() but not in skipTo()?" - I believe this is the right thing to do because at the initialization (ie in add(Scorer)) each added sub-scorer is advanced to its first match. So the first time that next() is called, sub-scorers are already "on the right spot", and not so for the first time skipTo() is called, because skipTo does not necessarily goes to the first match.
          Hide
          Paul Elschot added a comment -

          > > I'm not sure, but the patch allows to increment all generators right after the first time, and that might cause it to make a mistake in the score computation for its first doc.

          This is from code inspection only, I have no test cases showing wrong behaviour:
          What I meant is that during DisjunctionMaxScorer.next() initially the subscorers have next() called on them, so they are at their first docs, and then during next() (with the return statement deleted) the "generators are incremented again" even when at firstTime. That means that the score() might be computed on the wrong subscorer docs after the first call to next().

          But this issue is about DisjunctionMaxScorer.skipTo(), and removing the firstTime return statement there is correct I think.

          Show
          Paul Elschot added a comment - > > I'm not sure, but the patch allows to increment all generators right after the first time, and that might cause it to make a mistake in the score computation for its first doc. This is from code inspection only, I have no test cases showing wrong behaviour: What I meant is that during DisjunctionMaxScorer.next() initially the subscorers have next() called on them, so they are at their first docs, and then during next() (with the return statement deleted) the "generators are incremented again" even when at firstTime. That means that the score() might be computed on the wrong subscorer docs after the first call to next(). But this issue is about DisjunctionMaxScorer.skipTo(), and removing the firstTime return statement there is correct I think.
          Hide
          Doron Cohen added a comment -

          Committed the original lucene-912.patch by Hoss.
          Thanks Hoss and Sudaakeran B. !

          Show
          Doron Cohen added a comment - Committed the original lucene-912.patch by Hoss. Thanks Hoss and Sudaakeran B. !
          Hide
          Doron Cohen added a comment -

          Actually I changed my mind.... (and saw your reply just before clicking "Add" on this comment):
          Taking back my last comment - with Michael's fix there are no extra cycles in the expected use of score().
          If we try to avoid re-computation in case score() is called twice for same doc, there would be an additional if() for every call to score().

          I think I will take it from here to a new issue and handle these two separately.

          Show
          Doron Cohen added a comment - Actually I changed my mind.... (and saw your reply just before clicking "Add" on this comment): Taking back my last comment - with Michael's fix there are no extra cycles in the expected use of score(). If we try to avoid re-computation in case score() is called twice for same doc, there would be an additional if() for every call to score(). I think I will take it from here to a new issue and handle these two separately.
          Hide
          Michael Busch added a comment -

          Yes I agree. Keeping the score in BooleanScorer until next() or skipTo() is
          called is a good optimization to avoid recomputation. I guess I was just
          curios about the reasons why the tests failed.

          Show
          Michael Busch added a comment - Yes I agree. Keeping the score in BooleanScorer until next() or skipTo() is called is a good optimization to avoid recomputation. I guess I was just curios about the reasons why the tests failed.
          Hide
          Doron Cohen added a comment -

          This would recompute, but correctly.
          If this fix is just for sanity, ie main search methods really call score() just once, then I guess this is good enough.
          I was thinking more on not recomputing score if it was already computed (for current doc).
          (That would require to also maintain the last score returned.)

          Show
          Doron Cohen added a comment - This would recompute, but correctly. If this fix is just for sanity, ie main search methods really call score() just once, then I guess this is good enough. I was thinking more on not recomputing score if it was already computed (for current doc). (That would require to also maintain the last score returned.)
          Hide
          Michael Busch added a comment -

          In BooleanScorer2.score() the coordinator is initialized: coordinator.initDoc();
          This results in Coordinator.nrMatchers = 0.

          Now look at the statements that change the value of nrMatchers in BooleanScorer.
          There are three of those. E. g. in countingDisjunctionSumScorer:
          if (this.doc() > lastScoredDoc)

          { lastScoredDoc = this.doc(); coordinator.nrMatchers += super.nrMatchers; }

          nrMatchers is only increased in case this.doc() is greater than lastScoredDoc.
          But if score() is called twice, than those values are equal. So we have to
          fix the if statement to
          if (this.doc() >= lastScoredDoc) {

          With this patch and dismax_skipto.patch now all tests pass (including your
          new ones, Doron).

          Show
          Michael Busch added a comment - In BooleanScorer2.score() the coordinator is initialized: coordinator.initDoc(); This results in Coordinator.nrMatchers = 0. Now look at the statements that change the value of nrMatchers in BooleanScorer. There are three of those. E. g. in countingDisjunctionSumScorer: if (this.doc() > lastScoredDoc) { lastScoredDoc = this.doc(); coordinator.nrMatchers += super.nrMatchers; } nrMatchers is only increased in case this.doc() is greater than lastScoredDoc. But if score() is called twice, than those values are equal. So we have to fix the if statement to if (this.doc() >= lastScoredDoc) { With this patch and dismax_skipto.patch now all tests pass (including your new ones, Doron).
          Hide
          Doron Cohen added a comment -

          Updating checkTwoCallsToScore.patch (unintended comment out in previous file).

          Show
          Doron Cohen added a comment - Updating checkTwoCallsToScore.patch (unintended comment out in previous file).
          Hide
          Doron Cohen added a comment -

          Attached adds to QueryUtils.check(Query q1, Searcher s):

          • score() stability test (two calls to score() should return same score)
          • first skipTo tests

          The score() stability tests has failures for BooleanQueries and for DisjunctionMaxQuery.

          I didn't look into the failures cause yet.

          Show
          Doron Cohen added a comment - Attached adds to QueryUtils.check(Query q1, Searcher s): score() stability test (two calls to score() should return same score) first skipTo tests The score() stability tests has failures for BooleanQueries and for DisjunctionMaxQuery. I didn't look into the failures cause yet.
          Hide
          Yonik Seeley added a comment -

          Yes, two calls to score() should yield the same score.
          What scorer was broken?

          Show
          Yonik Seeley added a comment - Yes, two calls to score() should yield the same score. What scorer was broken?
          Hide
          Doron Cohen added a comment -

          > dismax score calculation isn't incremental... it's all done during the call to score()).
          > Is there an issue I'm not thinking of?

          I was wondering why QueryUtils.checkSkipTo() did not expose that. But this is because that check skips doc by doc, always to docs found by the tested query.

          Enhancing that test reveals that for some scorers, two consecutive calls to score() return different scores (no next() or skipTo() calls in between).
          This seems like a bug.

          Show
          Doron Cohen added a comment - > dismax score calculation isn't incremental... it's all done during the call to score()). > Is there an issue I'm not thinking of? I was wondering why QueryUtils.checkSkipTo() did not expose that. But this is because that check skips doc by doc, always to docs found by the tested query. Enhancing that test reveals that for some scorers, two consecutive calls to score() return different scores (no next() or skipTo() calls in between). This seems like a bug.
          Hide
          Yonik Seeley added a comment -

          > I'm not sure, but the patch allows to increment all generators right after the first time, and that might cause it to make a mistake in the score computation for its first doc.

          If all the sub-scorers are on the correct document after skipTo is called (which looks to be the case), then everything should be OK when DisjunctionMaxScorer.score() is called after that (dismax score calculation isn't incremental... it's all done during the call to score()). Is there an issue I'm not thinking of?

          Show
          Yonik Seeley added a comment - > I'm not sure, but the patch allows to increment all generators right after the first time, and that might cause it to make a mistake in the score computation for its first doc. If all the sub-scorers are on the correct document after skipTo is called (which looks to be the case), then everything should be OK when DisjunctionMaxScorer.score() is called after that (dismax score calculation isn't incremental... it's all done during the call to score()). Is there an issue I'm not thinking of?
          Hide
          Paul Elschot added a comment -

          I'm not sure, but the patch allows to increment all generators right after the first time, and that might cause it to make a mistake in the score computation for its first doc.

          It might be worthwhile to make DisjunctionMaxScorer use the same specialized queue/heap as DisjunctionSumScorer.
          Factoring this out could lead to a common superclass that provides e.g. an array of Scorers that match the disjunction.
          At the same time the score computations of the subscorers could be delayed until an actual score value is needed.
          That is a rather drastic approach, but the specialized queue/heap of DisjunctionSumScorer might also help performance of DisjunctionMaxScorer

          Show
          Paul Elschot added a comment - I'm not sure, but the patch allows to increment all generators right after the first time, and that might cause it to make a mistake in the score computation for its first doc. It might be worthwhile to make DisjunctionMaxScorer use the same specialized queue/heap as DisjunctionSumScorer. Factoring this out could lead to a common superclass that provides e.g. an array of Scorers that match the disjunction. At the same time the score computations of the subscorers could be delayed until an actual score value is needed. That is a rather drastic approach, but the specialized queue/heap of DisjunctionSumScorer might also help performance of DisjunctionMaxScorer
          Hide
          Hoss Man added a comment -

          patch with test and fix ... it would be helpful if someone who understands scorers better then i do could take a look at this and make sure there isn't some obvious cases that aren't accounted for in the (existing or new) tests that are still obviously broken

          Show
          Hoss Man added a comment - patch with test and fix ... it would be helpful if someone who understands scorers better then i do could take a look at this and make sure there isn't some obvious cases that aren't accounted for in the (existing or new) tests that are still obviously broken

            People

            • Assignee:
              Doron Cohen
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development