Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9269

Blended queries with boolean rewrite can result in inconsistent scores

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 8.4
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If two blended queries are should clauses of a boolean query and are built so that

      • some of their terms are the same
      • their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

      the docFreq for the overlapping terms used for scoring is picked as follow:

      1. if the overlapping terms are not boosted, the df of the term in the first blended query is used
      2. if any of the overlapping terms is boosted, the df is picked at (what looks like) random.

      A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).

      a)
      Blended(f:a f:b) Blended (f:a)
              df: 3             df: 2
      gets rewritten to:
      (f:a)^2.0 (f:b)
      df: 3      df:2
      
      b)
      Blended(f:a) Blended(f:a f:b)
              df: 2        df: 3
      gets rewritten to:
      (f:a)^2.0 (f:b)
       df: 2     df:2
      
      c)
      Blended(f:a f:b^0.66) Blended (f:a^0.75)
              df: 3                  df: 2
      gets rewritten to:
      (f:a)^1.75 (f:b)^0.66
       df:?       df:2
      

      with ? either 2 or 3, depending on the run.

       

        Attachments

        1. LUCENE-9269-test.patch
          3 kB
          Michele Palmia

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              micpalmia Michele Palmia
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: