Lucene - Core
  1. Lucene - Core
  2. LUCENE-5245

ConstantScoreAutoRewrite rewrites prefix queryies that don't match anything before query weight is calculated

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.4
    • Fix Version/s: 4.5, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      ConstantScoreAutoRewrite rewrites prefix queryies that don't match anything before query weight is calculated. This dramatically changes the resulting score which is bad when comparing scores across different Lucene indexes/shards/whatever.

      1. LUCENE-5245.patch
        6 kB
        Uwe Schindler
      2. LUCENE-5245.patch
        2 kB
        Uwe Schindler
      3. LUCENE-5245.patch
        0.8 kB
        Nik Everett

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        10h 17m 1 Uwe Schindler 26/Sep/13 07:54
        Resolved Resolved Closed Closed
        9d 3h 24m 1 Adrien Grand 05/Oct/13 11:19
        Adrien Grand made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close
        Hide
        ASF subversion and git services added a comment -

        Commit 1526581 from Adrien Grand in branch 'dev/branches/lucene_solr_4_5'
        [ https://svn.apache.org/r1526581 ]

        LUCENE-5245: backport to lucene_4_5

        Show
        ASF subversion and git services added a comment - Commit 1526581 from Adrien Grand in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1526581 ] LUCENE-5245 : backport to lucene_4_5
        Adrien Grand made changes -
        Fix Version/s 4.5 [ 12324742 ]
        Fix Version/s 4.6 [ 12324999 ]
        Hide
        ASF subversion and git services added a comment -

        Commit 1526573 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1526573 ]

        LUCENE-5245: backport to lucene_4_5

        Show
        ASF subversion and git services added a comment - Commit 1526573 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1526573 ] LUCENE-5245 : backport to lucene_4_5
        Hide
        ASF subversion and git services added a comment -

        Commit 1526571 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1526571 ]

        LUCENE-5245: backport to lucene_4_5

        Show
        ASF subversion and git services added a comment - Commit 1526571 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1526571 ] LUCENE-5245 : backport to lucene_4_5
        Hide
        Nik Everett added a comment -

        Thanks for jumping on this so quickly!

        Show
        Nik Everett added a comment - Thanks for jumping on this so quickly!
        Uwe Schindler made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Uwe Schindler added a comment -

        Thanks Nik!

        Show
        Uwe Schindler added a comment - Thanks Nik!
        Hide
        ASF subversion and git services added a comment -

        Commit 1526401 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1526401 ]

        Merged revision(s) 1526399 from lucene/dev/trunk:
        LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm

        Show
        ASF subversion and git services added a comment - Commit 1526401 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1526401 ] Merged revision(s) 1526399 from lucene/dev/trunk: LUCENE-5245 : Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm
        Hide
        ASF subversion and git services added a comment -

        Commit 1526399 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1526399 ]

        LUCENE-5245: Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm

        Show
        ASF subversion and git services added a comment - Commit 1526399 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1526399 ] LUCENE-5245 : Fix MultiTermQuery's constant score rewrites to always return a ConstantScoreQuery to make scoring consistent. Previously it returned an empty unwrapped BooleanQuery, if no terms were available, which has a different query norm
        Uwe Schindler made changes -
        Attachment LUCENE-5245.patch [ 12605118 ]
        Hide
        Uwe Schindler added a comment -

        New patch including test case that compares all 3 constant rewrites and also all 3 constant rewrites with a non-matching MTQ (using a should with a dummy term, so the query norm can be checked to be identical).

        I will commit this tomorrow.

        Show
        Uwe Schindler added a comment - New patch including test case that compares all 3 constant rewrites and also all 3 constant rewrites with a non-matching MTQ (using a should with a dummy term, so the query norm can be checked to be identical). I will commit this tomorrow.
        Uwe Schindler made changes -
        Attachment LUCENE-5245.patch [ 12605112 ]
        Uwe Schindler made changes -
        Attachment LUCENE-5245.patch [ 12605114 ]
        Uwe Schindler made changes -
        Attachment LUCENE-5245.patch [ 12605112 ]
        Hide
        Uwe Schindler added a comment -

        Here is a patch that fixes both issues!

        Michael McCandless: The issue is only affecting rewrites with 0 terms, so our shortcut is too aggressive. We return BooleanQuery(true) empty in that case, hwich has a different querynorm than ConstantScoreQuery, resulting in different scores. To be consistent we should return the same query type (ConstantScoreQuery for the constant rewrites). This has no speed impact, as the scorer is always empty.

        Show
        Uwe Schindler added a comment - Here is a patch that fixes both issues! Michael McCandless : The issue is only affecting rewrites with 0 terms, so our shortcut is too aggressive. We return BooleanQuery(true) empty in that case, hwich has a different querynorm than ConstantScoreQuery, resulting in different scores. To be consistent we should return the same query type (ConstantScoreQuery for the constant rewrites). This has no speed impact, as the scorer is always empty.
        Uwe Schindler made changes -
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.6 [ 12324999 ]
        Uwe Schindler made changes -
        Assignee Uwe Schindler [ thetaphi ]
        Hide
        Uwe Schindler added a comment -

        ScoringRewrite#CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE has the same problem.

        Show
        Uwe Schindler added a comment - ScoringRewrite#CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE has the same problem.
        Hide
        Uwe Schindler added a comment -

        Ah sorry. It only applies for the case where no term is found. Yes, in that case the boost is missing and affects query norm!

        Thanks for opening the issue.

        Show
        Uwe Schindler added a comment - Ah sorry. It only applies for the case where no term is found. Yes, in that case the boost is missing and affects query norm! Thanks for opening the issue.
        Hide
        Uwe Schindler added a comment -

        Your patch applies the constant scoring 2 times and also multiplies boost 2 times.

        Show
        Uwe Schindler added a comment - Your patch applies the constant scoring 2 times and also multiplies boost 2 times.
        Nik Everett made changes -
        Field Original Value New Value
        Attachment LUCENE-5245.patch [ 12605106 ]
        Hide
        Nik Everett added a comment -

        This fixes my problem but I'm not sure how to setup unit tests in Lucene.

        Show
        Nik Everett added a comment - This fixes my problem but I'm not sure how to setup unit tests in Lucene.
        Hide
        Nik Everett added a comment -

        The query norm applied to the constant score query changes. Say I had a query string like "foo:findm*^20 bar:findm*" and only foo had a result on shard 1 and only bar had a result shard 2. Both end up with the same score because on shard one the query is rewritten to "foo:findm*^20" (norm = .05) and "bar:findm*" (norm = 1).

        Show
        Nik Everett added a comment - The query norm applied to the constant score query changes. Say I had a query string like "foo:findm*^20 bar:findm*" and only foo had a result on shard 1 and only bar had a result shard 2. Both end up with the same score because on shard one the query is rewritten to "foo:findm*^20" (norm = .05) and "bar:findm*" (norm = 1).
        Hide
        Uwe Schindler added a comment -

        The query is constant score, so the score is always the same (the boost factor). What is the problem?

        Show
        Uwe Schindler added a comment - The query is constant score, so the score is always the same (the boost factor). What is the problem?
        Nik Everett created issue -

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Nik Everett
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development