Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.8, 6.0
    • core/search
    • None
    • New

    Description

      This is just an exploratory patch ... still many nocommits, but I
      think it may be promising.

      I find the two booleans we pass to Weight.scorer confusing, because
      they really only apply to whoever will call score(Collector) (just
      IndexSearcher and BooleanScorer).

      The params are pointless for the vast majority of scorers, because
      very, very few query scorers really need to change how top-scoring is
      done, and those scorers can only score top-level (throw throw UOE
      from nextDoc/advance). It seems like these two types of scorers
      should be separately typed.

      Attachments

        1. LUCENE-5487.patch
          130 kB
          Michael McCandless
        2. LUCENE-5487.patch
          86 kB
          Michael McCandless
        3. LUCENE-5487.patch
          84 kB
          Michael McCandless

        Activity

          Patch. I made a new class called TopScorer (we can pick a better
          name...), separated out a Weight.topScorer, with the obvious default
          impl, and then removed the two boolean params from Weight.scorer.

          Tests pass.

          mikemccand Michael McCandless added a comment - Patch. I made a new class called TopScorer (we can pick a better name...), separated out a Weight.topScorer, with the obvious default impl, and then removed the two boolean params from Weight.scorer. Tests pass.
          uschindler Uwe Schindler added a comment -

          I completely agree. Will review the patch later!

          uschindler Uwe Schindler added a comment - I completely agree. Will review the patch later!
          uschindler Uwe Schindler added a comment -

          Hi Mike,
          I think, TopScorer.java is missing in your patch! Otherwise looks great. You also implemented ConstantScorer's crazy wrapping for TopScorer. Now much easier to understand!
          Uwe

          uschindler Uwe Schindler added a comment - Hi Mike, I think, TopScorer.java is missing in your patch! Otherwise looks great. You also implemented ConstantScorer's crazy wrapping for TopScorer. Now much easier to understand! Uwe

          Thanks Uwe, here's a new patch that includes TopScorer

          mikemccand Michael McCandless added a comment - Thanks Uwe, here's a new patch that includes TopScorer
          uschindler Uwe Schindler added a comment -

          Thanks! The javadocs of TopScorer are outdated. The firstDocId is no longer passed. I don't like this change, because in the default impl inside Weight.java, we now need the additional code to check if scorer is still on -1 or not. You added a comment about the setScorer, too. Mabye there is room for improvements, by:

          • documenting API in a better way (sorry the crazyness of this score(Collector, maxDoc) stuff is horrible to understand, especially, if you want to implement it correctly!)
          • possibly move some code away, so score(Collector) does not need to do the checks on the underlying sequential scorer

          Otherwise, I like the 2 separate classes!

          Another idea I had: we can make TopScorer an interface. So Scorers that support topScoring just need to implement the interface - code looks like before... Have not thought more about it!

          uschindler Uwe Schindler added a comment - Thanks! The javadocs of TopScorer are outdated. The firstDocId is no longer passed. I don't like this change, because in the default impl inside Weight.java, we now need the additional code to check if scorer is still on -1 or not. You added a comment about the setScorer, too. Mabye there is room for improvements, by: documenting API in a better way (sorry the crazyness of this score(Collector, maxDoc) stuff is horrible to understand, especially, if you want to implement it correctly!) possibly move some code away, so score(Collector) does not need to do the checks on the underlying sequential scorer Otherwise, I like the 2 separate classes! Another idea I had: we can make TopScorer an interface. So Scorers that support topScoring just need to implement the interface - code looks like before... Have not thought more about it!
          paul.elschot@xs4all.nl Paul Elschot added a comment -

          I like this, lots of simplifications.

          About this nocommit:

          + // nocommit add test verifying that BQ inside BQ can get BS1
          + // not BS2 like today

          Iirc a BS1 inside a BS1 will work correctly because both use the same doc intervals in which the unordered docs can occur.
          I do not recall anyone reporting a problem with this with nested BQ's, but I never tested it either.

          And the doc interval might be increased (2048 now) to try and have even better BS1 performance.

          paul.elschot@xs4all.nl Paul Elschot added a comment - I like this, lots of simplifications. About this nocommit: + // nocommit add test verifying that BQ inside BQ can get BS1 + // not BS2 like today Iirc a BS1 inside a BS1 will work correctly because both use the same doc intervals in which the unordered docs can occur. I do not recall anyone reporting a problem with this with nested BQ's, but I never tested it either. And the doc interval might be increased (2048 now) to try and have even better BS1 performance.

          The javadocs of TopScorer are outdated. The firstDocId is no longer passed.

          Woops I'll fix.

          I don't like this change, because in the default impl inside Weight.java, we now need the additional code to check if scorer is still on -1 or not.

          Yeah ... not sure what to do about it. It became hard to know the firstDocID since consumers of TopScorer can't .nextDoc. Maybe there is another way ...

          (sorry the crazyness of this score(Collector, maxDoc) stuff is horrible to understand, especially, if you want to implement it correctly!)

          It is ... and I think we only have this version (the one that takes 2nd param, int max) for the "BS1 embeds BS1" case? Maybe we don't need to optimize this case? (Maybe a BQ embedded a BQ should rewrite itself to a single BQ, if somehow we could account for the proper coord scoring).

          Another idea I had: we can make TopScorer an interface

          That's a neat idea! So then e.g. IS.search (and BS1 embeds BS1 case) would check if the Scorer it pulled impls TopScorer, and if so, cast it and call .score(Collector); else, it would fallback to the default impl.

          mikemccand Michael McCandless added a comment - The javadocs of TopScorer are outdated. The firstDocId is no longer passed. Woops I'll fix. I don't like this change, because in the default impl inside Weight.java, we now need the additional code to check if scorer is still on -1 or not. Yeah ... not sure what to do about it. It became hard to know the firstDocID since consumers of TopScorer can't .nextDoc. Maybe there is another way ... (sorry the crazyness of this score(Collector, maxDoc) stuff is horrible to understand, especially, if you want to implement it correctly!) It is ... and I think we only have this version (the one that takes 2nd param, int max) for the "BS1 embeds BS1" case? Maybe we don't need to optimize this case? (Maybe a BQ embedded a BQ should rewrite itself to a single BQ, if somehow we could account for the proper coord scoring). Another idea I had: we can make TopScorer an interface That's a neat idea! So then e.g. IS.search (and BS1 embeds BS1 case) would check if the Scorer it pulled impls TopScorer, and if so, cast it and call .score(Collector); else, it would fallback to the default impl.

          Iirc a BS1 inside a BS1 will work correctly because both use the same doc intervals in which the unordered docs can occur.

          I think today if you nest two BQs, e.g. all SHOULD clauses, the outer BQ will use BS2 and the inner one will use BS1. This is because BQ's Weight.scorer passes "true" for topScorer when it pulls the sub-scorers.

          But, the patch should fix this I think, so that BS1 sub-scorer can be used. So the nocommit is to make a test confirming this is really happening... (it's hairy!).

          mikemccand Michael McCandless added a comment - Iirc a BS1 inside a BS1 will work correctly because both use the same doc intervals in which the unordered docs can occur. I think today if you nest two BQs, e.g. all SHOULD clauses, the outer BQ will use BS2 and the inner one will use BS1. This is because BQ's Weight.scorer passes "true" for topScorer when it pulls the sub-scorers. But, the patch should fix this I think, so that BS1 sub-scorer can be used. So the nocommit is to make a test confirming this is really happening... (it's hairy!).

          Commit 1573830 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1573830 ]

          LUCENE-5487: commit current patch

          jira-bot ASF subversion and git services added a comment - Commit 1573830 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1573830 ] LUCENE-5487 : commit current patch

          I made a branch & committed the last patch: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5487

          mikemccand Michael McCandless added a comment - I made a branch & committed the last patch: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5487

          Commit 1574169 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1574169 ]

          LUCENE-5487: also wrap TopScorer in AssertingWeight

          jira-bot ASF subversion and git services added a comment - Commit 1574169 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1574169 ] LUCENE-5487 : also wrap TopScorer in AssertingWeight

          Commit 1574188 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1574188 ]

          LUCENE-5487: add test case verifying we can embed BS1 inside BS1

          jira-bot ASF subversion and git services added a comment - Commit 1574188 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1574188 ] LUCENE-5487 : add test case verifying we can embed BS1 inside BS1

          Commit 1574434 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1574434 ]

          LUCENE-5487: add test case verifying a query time join inside a BQ still gets the optimized top-level scorer (it doesn't today); fixed nocommits

          jira-bot ASF subversion and git services added a comment - Commit 1574434 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1574434 ] LUCENE-5487 : add test case verifying a query time join inside a BQ still gets the optimized top-level scorer (it doesn't today); fixed nocommits

          Commit 1575057 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1575057 ]

          LUCENE-5487: add TopScorers to FilteredQuery too; fix Solr; resolve all nocommits

          jira-bot ASF subversion and git services added a comment - Commit 1575057 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1575057 ] LUCENE-5487 : add TopScorers to FilteredQuery too; fix Solr; resolve all nocommits

          Rob suggested BulkScorer as a better name than TopScorer ... I like it ... I'll rename it.

          mikemccand Michael McCandless added a comment - Rob suggested BulkScorer as a better name than TopScorer ... I like it ... I'll rename it.

          Commit 1575234 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1575234 ]

          LUCENE-5487: rename TopScorer -> BulkScorer

          jira-bot ASF subversion and git services added a comment - Commit 1575234 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1575234 ] LUCENE-5487 : rename TopScorer -> BulkScorer

          Commit 1575397 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1575397 ]

          LUCENE-5487: merge trunk

          jira-bot ASF subversion and git services added a comment - Commit 1575397 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1575397 ] LUCENE-5487 : merge trunk

          Applyable patch from the branch (generated by diffSources.py), I think it's ready.

          mikemccand Michael McCandless added a comment - Applyable patch from the branch (generated by diffSources.py), I think it's ready.

          I think this (Weight.scorer) is an expert enough API that we can fix it in 4.8 as well?

          mikemccand Michael McCandless added a comment - I think this (Weight.scorer) is an expert enough API that we can fix it in 4.8 as well?
          rcmuir Robert Muir added a comment -

          I think this (Weight.scorer) is an expert enough API that we can fix it in 4.8 as well?

          Definitely, I havent had a chance to review the patch but this seems good to do.

          rcmuir Robert Muir added a comment - I think this (Weight.scorer) is an expert enough API that we can fix it in 4.8 as well? Definitely, I havent had a chance to review the patch but this seems good to do.
          uschindler Uwe Schindler added a comment -

          I think we can backport this!

          uschindler Uwe Schindler added a comment - I think we can backport this!
          rcmuir Robert Muir added a comment -

          In this case no problem, but just as an FYI, if you have trunk/ and branch/, you always want to run the differ outside of both (this way the patch prefixes are the same: this one cant be applied by any patch tool).

          But this is no problem for me to review, i can just switch to your branch to see the context!

          rcmuir Robert Muir added a comment - In this case no problem, but just as an FYI, if you have trunk/ and branch/, you always want to run the differ outside of both (this way the patch prefixes are the same: this one cant be applied by any patch tool). But this is no problem for me to review, i can just switch to your branch to see the context!
          rcmuir Robert Muir added a comment -
          • docs for Weight.scoresDocsOutOfOrder() should refer to bulkScorer() instead of scorer()
          • a TODO should be added for BooleanWeight.scoresDocsOutOfOrder: its "out of sync" in the sense that it only checks for any required clause, but e.g. if someone has minNrShouldMatch > 1, it will lie and say its out of order, but then go and do in-order scoring (but with a slower collector). This is not new and not caused by your patch... and really it would be ideal if somehow the logic was in one place rather than duplicated.
          • should FakeScorer really not take the real Weight anymore? I don't know how useful it is, but its wierd that its null, since the Collector actually sees this thing via setScorer: if its not going to be supported then i think it should override getWeight to explicitly throw UOE?
          • what is the purpose of a LeapFrogBulkScorer? It seems to just use two in-order scorers, i dont understand its purpose. Is this supposed to be a code specialization? If so, how much faster is it than just... not doing that and using an in-order scorer in this case? (it seems maybe the latter way is actually faster, as you get the correct collector, same issue as BooleanWeight with minNrShouldMatch as mentioned above)
          rcmuir Robert Muir added a comment - docs for Weight.scoresDocsOutOfOrder() should refer to bulkScorer() instead of scorer() a TODO should be added for BooleanWeight.scoresDocsOutOfOrder: its "out of sync" in the sense that it only checks for any required clause, but e.g. if someone has minNrShouldMatch > 1, it will lie and say its out of order, but then go and do in-order scoring (but with a slower collector). This is not new and not caused by your patch... and really it would be ideal if somehow the logic was in one place rather than duplicated. should FakeScorer really not take the real Weight anymore? I don't know how useful it is, but its wierd that its null, since the Collector actually sees this thing via setScorer: if its not going to be supported then i think it should override getWeight to explicitly throw UOE? what is the purpose of a LeapFrogBulkScorer? It seems to just use two in-order scorers, i dont understand its purpose. Is this supposed to be a code specialization? If so, how much faster is it than just... not doing that and using an in-order scorer in this case? (it seems maybe the latter way is actually faster, as you get the correct collector, same issue as BooleanWeight with minNrShouldMatch as mentioned above)

          Thanks Rob!

          In this case no problem, but just as an FYI, if you have trunk/ and branch/, you always want to run the differ outside of both (this way the patch prefixes are the same: this one cant be applied by any patch tool).

          I actually did that at first but for some reason I thought the resulting patch file was wrong! Next time I'll do it like that.

          docs for Weight.scoresDocsOutOfOrder() should refer to bulkScorer() instead of scorer()

          Oh yeah, I'll fix.

          a TODO should be added for BooleanWeight.scoresDocsOutOfOrder

          I fixed this and added a simple test, on the branch.

          should FakeScorer really not take the real Weight anymore? I don't know how useful it is, but its wierd that its null, since the Collector actually sees this thing via setScorer: if its not going to be supported then i think it should override getWeight to explicitly throw UOE?

          It seems weird returning a real Weight when everything else is fake, but I guess we can just leave it as it was (in BooleanQuery)? All the other FakeScorers seem to do the null Weight thing, but I agree if we do that we should just override getWeight to throw UOE.

          what is the purpose of a LeapFrogBulkScorer? It seems to just use two in-order scorers, i dont understand its purpose. Is this supposed to be a code specialization?

          It was there before, but I think it's just code specialization ... I'll just nuke it and let Weight.bulkScorer do the default impl.

          mikemccand Michael McCandless added a comment - Thanks Rob! In this case no problem, but just as an FYI, if you have trunk/ and branch/, you always want to run the differ outside of both (this way the patch prefixes are the same: this one cant be applied by any patch tool). I actually did that at first but for some reason I thought the resulting patch file was wrong! Next time I'll do it like that. docs for Weight.scoresDocsOutOfOrder() should refer to bulkScorer() instead of scorer() Oh yeah, I'll fix. a TODO should be added for BooleanWeight.scoresDocsOutOfOrder I fixed this and added a simple test, on the branch. should FakeScorer really not take the real Weight anymore? I don't know how useful it is, but its wierd that its null, since the Collector actually sees this thing via setScorer: if its not going to be supported then i think it should override getWeight to explicitly throw UOE? It seems weird returning a real Weight when everything else is fake, but I guess we can just leave it as it was (in BooleanQuery)? All the other FakeScorers seem to do the null Weight thing, but I agree if we do that we should just override getWeight to throw UOE. what is the purpose of a LeapFrogBulkScorer? It seems to just use two in-order scorers, i dont understand its purpose. Is this supposed to be a code specialization? It was there before, but I think it's just code specialization ... I'll just nuke it and let Weight.bulkScorer do the default impl.

          Commit 1576066 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1576066 ]

          LUCENE-5487: add feedback from Rob

          jira-bot ASF subversion and git services added a comment - Commit 1576066 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1576066 ] LUCENE-5487 : add feedback from Rob

          Commit 1576096 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1576096 ]

          LUCENE-5487: throw OUE from FakeScorer.getWeight

          jira-bot ASF subversion and git services added a comment - Commit 1576096 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1576096 ] LUCENE-5487 : throw OUE from FakeScorer.getWeight
          rcmuir Robert Muir added a comment -

          +1

          rcmuir Robert Muir added a comment - +1

          Commit 1576273 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1576273 ]

          LUCENE-5487: consolidate the FakeScorers within one package

          jira-bot ASF subversion and git services added a comment - Commit 1576273 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1576273 ] LUCENE-5487 : consolidate the FakeScorers within one package

          Commit 1576274 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1576274 ]

          LUCENE-5487: consolidate the FakeScorers within one package

          jira-bot ASF subversion and git services added a comment - Commit 1576274 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1576274 ] LUCENE-5487 : consolidate the FakeScorers within one package

          Commit 1576473 from mikemccand in branch 'dev/branches/lucene5487'
          [ https://svn.apache.org/r1576473 ]

          LUCENE-5487: merge trunk

          jira-bot ASF subversion and git services added a comment - Commit 1576473 from mikemccand in branch 'dev/branches/lucene5487' [ https://svn.apache.org/r1576473 ] LUCENE-5487 : merge trunk

          OK I think this is ready ... I'll reintegrate the branch, add CHANGES and commit! Thanks for the review Rob & Uwe!

          mikemccand Michael McCandless added a comment - OK I think this is ready ... I'll reintegrate the branch, add CHANGES and commit! Thanks for the review Rob & Uwe!

          Commit 1576487 from mikemccand in branch 'dev/trunk'
          [ https://svn.apache.org/r1576487 ]

          LUCENE-5487: separate Weight.bulkScorer and Weight.scorer

          jira-bot ASF subversion and git services added a comment - Commit 1576487 from mikemccand in branch 'dev/trunk' [ https://svn.apache.org/r1576487 ] LUCENE-5487 : separate Weight.bulkScorer and Weight.scorer

          Commit 1576492 from mikemccand in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1576492 ]

          LUCENE-5487: separate Weight.bulkScorer and Weight.scorer

          jira-bot ASF subversion and git services added a comment - Commit 1576492 from mikemccand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1576492 ] LUCENE-5487 : separate Weight.bulkScorer and Weight.scorer

          Commit 1585967 from mikemccand@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1585967 ]

          LUCENE-5554: break out separate Weight.scoreRange/scoreAll so hotspot is relatively happy again after LUCENE-5487

          jira-bot ASF subversion and git services added a comment - Commit 1585967 from mikemccand@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1585967 ] LUCENE-5554 : break out separate Weight.scoreRange/scoreAll so hotspot is relatively happy again after LUCENE-5487

          Commit 1585968 from mikemccand@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1585968 ]

          LUCENE-5554: break out separate Weight.scoreRange/scoreAll so hotspot is relatively happy again after LUCENE-5487

          jira-bot ASF subversion and git services added a comment - Commit 1585968 from mikemccand@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1585968 ] LUCENE-5554 : break out separate Weight.scoreRange/scoreAll so hotspot is relatively happy again after LUCENE-5487

          mikemccand could save me from starring at "broke out separate Weight.scoreRange/scoreAll methods" for a few mins, if it was clued by a comment mentions #hotspot

          mkhl Mikhail Khludnev added a comment - mikemccand could save me from starring at "broke out separate Weight.scoreRange/scoreAll methods" for a few mins, if it was clued by a comment mentions #hotspot

          Commit 1589416 from mikemccand in branch 'dev/trunk'
          [ https://svn.apache.org/r1589416 ]

          LUCENE-5487: add comment explaining hotspot voodoo

          jira-bot ASF subversion and git services added a comment - Commit 1589416 from mikemccand in branch 'dev/trunk' [ https://svn.apache.org/r1589416 ] LUCENE-5487 : add comment explaining hotspot voodoo

          Michael McCandless could save me from starring at "broke out separate Weight.scoreRange/scoreAll methods" for a few mins, if it was clued by a comment mentions #hotspot

          I committed a fix ...

          mikemccand Michael McCandless added a comment - Michael McCandless could save me from starring at "broke out separate Weight.scoreRange/scoreAll methods" for a few mins, if it was clued by a comment mentions #hotspot I committed a fix ...

          Commit 1589422 from mikemccand in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1589422 ]

          LUCENE-5487: add comment explaining hotspot voodoo

          jira-bot ASF subversion and git services added a comment - Commit 1589422 from mikemccand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1589422 ] LUCENE-5487 : add comment explaining hotspot voodoo
          uschindler Uwe Schindler added a comment -

          Close issue after release of 4.8.0

          uschindler Uwe Schindler added a comment - Close issue after release of 4.8.0
          tomoko Tomoko Uchida added a comment -

          This issue was moved to GitHub issue: #6550.

          tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #6550 .

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: