Lucene - Core
  1. Lucene - Core
  2. LUCENE-5245

ConstantScoreAutoRewrite rewrites prefix queryies that don't match anything before query weight is calculated

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.4
    • Fix Version/s: 4.5, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      ConstantScoreAutoRewrite rewrites prefix queryies that don't match anything before query weight is calculated. This dramatically changes the resulting score which is bad when comparing scores across different Lucene indexes/shards/whatever.

      1. LUCENE-5245.patch
        0.8 kB
        Nik Everett
      2. LUCENE-5245.patch
        2 kB
        Uwe Schindler
      3. LUCENE-5245.patch
        6 kB
        Uwe Schindler

        Activity

        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close
        Hide
        ASF subversion and git services added a comment -

        Commit 1526581 from Adrien Grand in branch 'dev/branches/lucene_solr_4_5'
        [ https://svn.apache.org/r1526581 ]

        LUCENE-5245: backport to lucene_4_5

        Show
        ASF subversion and git services added a comment - Commit 1526581 from Adrien Grand in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1526581 ] LUCENE-5245 : backport to lucene_4_5
        Hide
        ASF subversion and git services added a comment -

        Commit 1526573 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1526573 ]

        LUCENE-5245: backport to lucene_4_5

        Show
        ASF subversion and git services added a comment - Commit 1526573 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1526573 ] LUCENE-5245 : backport to lucene_4_5
        Hide
        ASF subversion and git services added a comment -

        Commit 1526571 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1526571 ]

        LUCENE-5245: backport to lucene_4_5

        Show
        ASF subversion and git services added a comment - Commit 1526571 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1526571 ] LUCENE-5245 : backport to lucene_4_5
        Hide
        Nik Everett added a comment -

        Thanks for jumping on this so quickly!

        Show
        Nik Everett added a comment - Thanks for jumping on this so quickly!
        Hide
        Uwe Schindler added a comment -

        Thanks Nik!

        Show
        Uwe Schindler added a comment - Thanks Nik!
        Hide
        ASF subversion and git services added a comment -

        Commit 1526401 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1526401 ]

        Merged revision(s) 1526399 from lucene/dev/trunk:
        LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm

        Show
        ASF subversion and git services added a comment - Commit 1526401 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1526401 ] Merged revision(s) 1526399 from lucene/dev/trunk: LUCENE-5245 : Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm
        Hide
        ASF subversion and git services added a comment -

        Commit 1526399 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1526399 ]

        LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm

        Show
        ASF subversion and git services added a comment - Commit 1526399 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1526399 ] LUCENE-5245 : Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm
        Hide
        Uwe Schindler added a comment -

        New patch including test case that compares all 3 constant rewrites and also all 3 constant rewrites with a non-matching MTQ (using a should with a dummy term, so the query norm can be checked to be identical).

        I will commit this tomorrow.

        Show
        Uwe Schindler added a comment - New patch including test case that compares all 3 constant rewrites and also all 3 constant rewrites with a non-matching MTQ (using a should with a dummy term, so the query norm can be checked to be identical). I will commit this tomorrow.
        Hide
        Uwe Schindler added a comment -

        Here is a patch that fixes both issues!

        Michael McCandless: The issue is only affecting rewrites with 0 terms, so our shortcut is too aggressive. We return BooleanQuery(true) empty in that case, hwich has a different querynorm than ConstantScoreQuery, resulting in different scores. To be consistent we should return the same query type (ConstantScoreQuery for the constant rewrites). This has no speed impact, as the scorer is always empty.

        Show
        Uwe Schindler added a comment - Here is a patch that fixes both issues! Michael McCandless : The issue is only affecting rewrites with 0 terms, so our shortcut is too aggressive. We return BooleanQuery(true) empty in that case, hwich has a different querynorm than ConstantScoreQuery, resulting in different scores. To be consistent we should return the same query type (ConstantScoreQuery for the constant rewrites). This has no speed impact, as the scorer is always empty.
        Hide
        Uwe Schindler added a comment -

        ScoringRewrite#CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE has the same problem.

        Show
        Uwe Schindler added a comment - ScoringRewrite#CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE has the same problem.
        Hide
        Uwe Schindler added a comment -

        Ah sorry. It only applies for the case where no term is found. Yes, in that case the boost is missing and affects query norm!

        Thanks for opening the issue.

        Show
        Uwe Schindler added a comment - Ah sorry. It only applies for the case where no term is found. Yes, in that case the boost is missing and affects query norm! Thanks for opening the issue.
        Hide
        Uwe Schindler added a comment -

        Your patch applies the constant scoring 2 times and also multiplies boost 2 times.

        Show
        Uwe Schindler added a comment - Your patch applies the constant scoring 2 times and also multiplies boost 2 times.
        Hide
        Nik Everett added a comment -

        This fixes my problem but I'm not sure how to setup unit tests in Lucene.

        Show
        Nik Everett added a comment - This fixes my problem but I'm not sure how to setup unit tests in Lucene.
        Hide
        Nik Everett added a comment -

        The query norm applied to the constant score query changes. Say I had a query string like "foo:findm*^20 bar:findm*" and only foo had a result on shard 1 and only bar had a result shard 2. Both end up with the same score because on shard one the query is rewritten to "foo:findm*^20" (norm = .05) and "bar:findm*" (norm = 1).

        Show
        Nik Everett added a comment - The query norm applied to the constant score query changes. Say I had a query string like "foo:findm*^20 bar:findm*" and only foo had a result on shard 1 and only bar had a result shard 2. Both end up with the same score because on shard one the query is rewritten to "foo:findm*^20" (norm = .05) and "bar:findm*" (norm = 1).
        Hide
        Uwe Schindler added a comment -

        The query is constant score, so the score is always the same (the boost factor). What is the problem?

        Show
        Uwe Schindler added a comment - The query is constant score, so the score is always the same (the boost factor). What is the problem?

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Nik Everett
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development