Lucene - Core
  1. Lucene - Core
  2. LUCENE-2190

CustomScoreQuery (function query) is broken (due to per-segment searching)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 2.9.2, 3.0.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Spinoff from here:

      http://lucene.markmail.org/message/psw2m3adzibaixbq

      With the cutover to per-segment searching, CustomScoreQuery is not really usable anymore, because the per-doc custom scoring method (customScore) receives a per-segment docID, yet there is no way to figure out which segment you are currently searching.

      I think to fix this we must also notify the subclass whenever a new segment is switched to. I think if we copy Collector.setNextReader, that would be sufficient. It would by default do nothing in CustomScoreQuery, but a subclass could override.

      1. LUCENE-2190-2-branch29.patch
        28 kB
        Uwe Schindler
      2. LUCENE-2190-2-branch30.patch
        34 kB
        Uwe Schindler
      3. LUCENE-2190-2-trunk.patch
        32 kB
        Uwe Schindler
      4. LUCENE-2190-2-branch30.patch
        33 kB
        Uwe Schindler
      5. LUCENE-2190-2-trunk.patch
        32 kB
        Uwe Schindler
      6. LUCENE-2190-2.patch
        32 kB
        Uwe Schindler
      7. LUCENE-2190-2.patch
        32 kB
        Uwe Schindler
      8. LUCENE-2190.patch
        11 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch attached, adding setNextReader to CustomScoreQuery, and a test case. Also fixed a couple latent test bugs, when run on indexes with more than one segment.

        Show
        Michael McCandless added a comment - Patch attached, adding setNextReader to CustomScoreQuery, and a test case. Also fixed a couple latent test bugs, when run on indexes with more than one segment.
        Hide
        Michael McCandless added a comment -

        Patch applies to 2.9.x.

        Show
        Michael McCandless added a comment - Patch applies to 2.9.x.
        Hide
        Uwe Schindler added a comment -

        The fix is invalid:
        Adding setNextReader to CustomScoreQuery makes the Query itsself stateful. This breaks when using together with e.g. ParallelMultiSearcher.
        As the package is experimental, I see no problem in changing the method signature of customScore to pass in the affected IndexReader (CustomScorer can do this)

        Show
        Uwe Schindler added a comment - The fix is invalid: Adding setNextReader to CustomScoreQuery makes the Query itsself stateful. This breaks when using together with e.g. ParallelMultiSearcher. As the package is experimental, I see no problem in changing the method signature of customScore to pass in the affected IndexReader (CustomScorer can do this)
        Hide
        Uwe Schindler added a comment -

        We can preserve backwards compatibility is the default impl with the new reader only passes to the deprecated old customScore function.

        I will provide a patch tomorrow.

        Show
        Uwe Schindler added a comment - We can preserve backwards compatibility is the default impl with the new reader only passes to the deprecated old customScore function. I will provide a patch tomorrow.
        Hide
        Uwe Schindler added a comment -

        During refactoring I found out:

        CustomScoreQuery is more broken: the rewrite() method is wrong, for now its not really a problem but when we change to per-segment rewrite (as Mike plans) its broken. Its even broken if you rewrite against one IndexReader and want to reuse the query later on another IndexReader. It rewrites all its subqueries and returns itsself, which is wrong: if one of the subqueries was rewritten it must return a new clone instance (like BooleanQuery). Also hashCode and equals ignore strict.

        Will provide patch later. Now everything seems to work correct.

        Show
        Uwe Schindler added a comment - During refactoring I found out: CustomScoreQuery is more broken: the rewrite() method is wrong, for now its not really a problem but when we change to per-segment rewrite (as Mike plans) its broken. Its even broken if you rewrite against one IndexReader and want to reuse the query later on another IndexReader. It rewrites all its subqueries and returns itsself, which is wrong: if one of the subqueries was rewritten it must return a new clone instance (like BooleanQuery). Also hashCode and equals ignore strict. Will provide patch later. Now everything seems to work correct.
        Hide
        Michael McCandless added a comment -

        Adding setNextReader to CustomScoreQuery makes the Query itsself stateful. This breaks when using together with e.g. ParallelMultiSearcher.

        Ugh, you're right. Passing the current IR to customScore method isn't great because then you have to a check per call (per doc scored) to detect a switch to a new reader.

        but when we change to per-segment rewrite (as Mike plans)

        I'm not quite "planning" that I was mulling it... because it's somewhat inefficient now that an MTQ uses the multi terms enum instead of doing it per segment. But: this is only done for the BQ rewrite, which is only done when the number of terms is smallish (with AUTO default rewrite), so I think in practice the actual cost is low?

        Show
        Michael McCandless added a comment - Adding setNextReader to CustomScoreQuery makes the Query itsself stateful. This breaks when using together with e.g. ParallelMultiSearcher. Ugh, you're right. Passing the current IR to customScore method isn't great because then you have to a check per call (per doc scored) to detect a switch to a new reader. but when we change to per-segment rewrite (as Mike plans) I'm not quite "planning" that I was mulling it... because it's somewhat inefficient now that an MTQ uses the multi terms enum instead of doing it per segment. But: this is only done for the BQ rewrite, which is only done when the number of terms is smallish (with AUTO default rewrite), so I think in practice the actual cost is low?
        Hide
        Uwe Schindler added a comment -

        Here a better solution. It now works like Filter's getDocIdSet: For customizing scores, you have to override the similar protected method getCustomScoreProvider(IndexReader) and return a subclass of CustomScoreProvider. The default delegates to the backwards layer.

        The semantics is now identical to filters: Each IndexReader of a segment gets its own calculator like its own DocIdSet in filters.

        Also fixes the following problems:

        • rewrite() was incorrectly implemented (now works like BooleanQuery.rewrite())
        • equals/hashCode ignored strict
        Show
        Uwe Schindler added a comment - Here a better solution. It now works like Filter's getDocIdSet: For customizing scores, you have to override the similar protected method getCustomScoreProvider(IndexReader) and return a subclass of CustomScoreProvider. The default delegates to the backwards layer. The semantics is now identical to filters: Each IndexReader of a segment gets its own calculator like its own DocIdSet in filters. Also fixes the following problems: rewrite() was incorrectly implemented (now works like BooleanQuery.rewrite()) equals/hashCode ignored strict
        Hide
        Uwe Schindler added a comment -

        Updated patch (forgot to add an IOException to getCustomScoreProvider and fixed test).

        Will backport after committing to 3.0 and 2.9 (argh).

        Show
        Uwe Schindler added a comment - Updated patch (forgot to add an IOException to getCustomScoreProvider and fixed test). Will backport after committing to 3.0 and 2.9 (argh).
        Hide
        Michael McCandless added a comment -

        Patch looks good!

        Only thing is I think we should revert this:

        @@ -382,7 +387,7 @@
         
             @Override
             public boolean scoresDocsOutOfOrder() {
        -      return false;
        +      return subQueryWeight.scoresDocsOutOfOrder();
             }
        

        We need to return false from here because CSQ always scores in order (it asks the sub scorers for in-order scorers).

        Show
        Michael McCandless added a comment - Patch looks good! Only thing is I think we should revert this: @@ -382,7 +387,7 @@ @Override public boolean scoresDocsOutOfOrder() { - return false ; + return subQueryWeight.scoresDocsOutOfOrder(); } We need to return false from here because CSQ always scores in order (it asks the sub scorers for in-order scorers).
        Hide
        Uwe Schindler added a comment -

        Here the patches for trunk (without deprecations) and 3.0 brach. 2.9 will be merged later. Merging from trunk -> 3.0 is not possible as TestCase heavily rewritten.

        Show
        Uwe Schindler added a comment - Here the patches for trunk (without deprecations) and 3.0 brach. 2.9 will be merged later. Merging from trunk -> 3.0 is not possible as TestCase heavily rewritten.
        Hide
        Michael McCandless added a comment -

        OK new patches look good Uwe! Thanks for fixing this

        Show
        Michael McCandless added a comment - OK new patches look good Uwe! Thanks for fixing this
        Hide
        Uwe Schindler added a comment -

        Updated patches without javadocs-warnings / fixed javadocs. In trunk the backwards branch needs to be patched, too (merge from 3.0 branch).

        Show
        Uwe Schindler added a comment - Updated patches without javadocs-warnings / fixed javadocs. In trunk the backwards branch needs to be patched, too (merge from 3.0 branch).
        Hide
        Uwe Schindler added a comment -

        Here the patch for 2.9

        Show
        Uwe Schindler added a comment - Here the patch for 2.9
        Hide
        Uwe Schindler added a comment -

        Committed 3.0 branch revision: 912383, 912389
        Committed trunk revision: 912386
        Committed 2.9 branch revision: 912390

        Thanks Mike for the help!

        Show
        Uwe Schindler added a comment - Committed 3.0 branch revision: 912383, 912389 Committed trunk revision: 912386 Committed 2.9 branch revision: 912390 Thanks Mike for the help!

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development