Lucene - Core
  1. Lucene - Core
  2. LUCENE-2838

ConstantScoreQuery should directly support wrapping Query and simply strip off scores

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Especially in MultiTermQuery rewrite modes we often simply need to strip off scores from Queries and make them constant score. Currently the code to do this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query))

      As the name says, QueryWrapperFilter should make any other Query constant score, so why does it not take a Query as ctor param? This question was aldso asked quite often by my customers and is simply correct, if you think about it.

      Looking closer into the code, it is clear that this would also speed up MTQs:

      • One additional wrapping and method calls can be removed
      • Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only used in tests and the use-case for this class is not really available) and LUCENE-2831 does not need the stupid hack to make Simon's assertions pass
      • CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on top-level now directly feeds the Collector. For that a small trick is used: The score(Collector) calls are directly delegated and the scores are stripped by wrapping the setScorer() method in Collector

      During that I found a visibility bug in Scorer (LUCENE-2839): The method "boolean score(Collector collector, int max, int firstDocID)" should be public not protected, as its not solely intended to be overridden by subclasses and is called from other classes, too! This leads to no compiler bugs as the other classes that calls it is mainly BooleanScorer(2) and thats in same package, but visibility is wrong. I will open an issue for that and fix it at least in trunk where we have no backwards-requirement.

      1. LUCENE-2838.patch
        24 kB
        Uwe Schindler
      2. LUCENE-2838.patch
        20 kB
        Uwe Schindler
      3. LUCENE-2838.patch
        12 kB
        Uwe Schindler
      4. LUCENE-2838-no-topscorer-opt.patch
        18 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Cleaned up patch:

          • removed a useless testcase, no longer needed
          • added test for CSQ, that checks equals and hashCode
          • code cleanup
          • javadocs

          I will commit this if nobody objects to 3.x and trunk. About deprecating QWF we should discuss in separate issues, maybe we can merge Filter and Query before 4.0!

          Show
          Uwe Schindler added a comment - Cleaned up patch: removed a useless testcase, no longer needed added test for CSQ, that checks equals and hashCode code cleanup javadocs I will commit this if nobody objects to 3.x and trunk. About deprecating QWF we should discuss in separate issues, maybe we can merge Filter and Query before 4.0!
          Hide
          Uwe Schindler added a comment -

          After thinking one day about it, I found some problems with the "collector hack" and this style of decorator pattern:

          • If you wrap multiple times, the setScorer() method in the wrapped collector may set the wrong scorer (you see this, if you wrap multiple ConstantScoreQueries on top of each other, then the boost of the inner one is returned. The problem is that the score(Collector) method inverts the decorator pattern.
          • The inner scorer (like BoolenScorer with its buckets) may set a different scorer in the collector than itsself that implements doc() different, so setting the ConstantScorer always as collector's scorer can lead to wrong results (we dont see this in the test, as no collector uses Scorer.doc(), only Scorer.score(), which returns constant).

          I changed the code so CSQ now passes always topScorer=false to Weight.scorer() of the wrapped query and does not overwrite score(Collector,...) methods. It still allows out-of-order collection. Now BooleanScorer2 is always used with MTQs.

          The question is, would the previous but broken optimization make sense for speed? Mike/Mark?

          Show
          Uwe Schindler added a comment - After thinking one day about it, I found some problems with the "collector hack" and this style of decorator pattern: If you wrap multiple times, the setScorer() method in the wrapped collector may set the wrong scorer (you see this, if you wrap multiple ConstantScoreQueries on top of each other, then the boost of the inner one is returned. The problem is that the score(Collector) method inverts the decorator pattern. The inner scorer (like BoolenScorer with its buckets) may set a different scorer in the collector than itsself that implements doc() different, so setting the ConstantScorer always as collector's scorer can lead to wrong results (we dont see this in the test, as no collector uses Scorer.doc(), only Scorer.score(), which returns constant). I changed the code so CSQ now passes always topScorer=false to Weight.scorer() of the wrapped query and does not overwrite score(Collector,...) methods. It still allows out-of-order collection. Now BooleanScorer2 is always used with MTQs. The question is, would the previous but broken optimization make sense for speed? Mike/Mark?
          Hide
          Uwe Schindler added a comment - - edited

          Attached is the correct solution of ConstantScoreQuery with a scorer that supports topScorer and directly delegates hit collection to its wrapped scorer. This enables use of the bucket-using BooleanScorer instead of BooleanScorer2 for MTQs.

          The test case verifies that the correct boosts are collected and queries wrapped two times with different boosts return the correct scores, too.

          Show
          Uwe Schindler added a comment - - edited Attached is the correct solution of ConstantScoreQuery with a scorer that supports topScorer and directly delegates hit collection to its wrapped scorer. This enables use of the bucket-using BooleanScorer instead of BooleanScorer2 for MTQs. The test case verifies that the correct boosts are collected and queries wrapped two times with different boosts return the correct scores, too.
          Hide
          Uwe Schindler added a comment -

          Sorry last patch was outdated...

          Show
          Uwe Schindler added a comment - Sorry last patch was outdated...
          Hide
          Robert Muir added a comment -

          I benchmarked the test (and wired the benchmark to use constant score boolean)...
          I think the changes are in the wind, but its definitely no slower... +1 from me.

          separately, at some point we should open an issue to re-assess the cutoff criteria for constant score auto (maybe after the termstate thing).
          we have been hacking on the boolean rewrites a lot and we want to make sure we default to the best performance!

          Show
          Robert Muir added a comment - I benchmarked the test (and wired the benchmark to use constant score boolean)... I think the changes are in the wind, but its definitely no slower... +1 from me. separately, at some point we should open an issue to re-assess the cutoff criteria for constant score auto (maybe after the termstate thing). we have been hacking on the boolean rewrites a lot and we want to make sure we default to the best performance!
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1054406

          Now backporting!

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1054406 Now backporting!
          Hide
          Uwe Schindler added a comment -

          Committed 3.x revision: 1054412

          Thanks Robert for building heavy indexes and testing performance!

          Show
          Uwe Schindler added a comment - Committed 3.x revision: 1054412 Thanks Robert for building heavy indexes and testing performance!
          Hide
          Michael McCandless added a comment -

          separately, at some point we should open an issue to re-assess the cutoff criteria for constant score auto (maybe after the termstate thing).

          +1

          I've been wondering about those cutoffs for a while now...

          Show
          Michael McCandless added a comment - separately, at some point we should open an issue to re-assess the cutoff criteria for constant score auto (maybe after the termstate thing). +1 I've been wondering about those cutoffs for a while now...
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development