Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9269

Blended queries with boolean rewrite can result in inconsistent scores

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.4
    • None
    • core/search
    • None
    • New

    Description

      If two blended queries are should clauses of a boolean query and are built so that

      • some of their terms are the same
      • their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

      the docFreq for the overlapping terms used for scoring is picked as follow:

      1. if the overlapping terms are not boosted, the df of the term in the first blended query is used
      2. if any of the overlapping terms is boosted, the df is picked at (what looks like) random.

      A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).

      a)
      Blended(f:a f:b) Blended (f:a)
              df: 3             df: 2
      gets rewritten to:
      (f:a)^2.0 (f:b)
      df: 3      df:2
      
      b)
      Blended(f:a) Blended(f:a f:b)
              df: 2        df: 3
      gets rewritten to:
      (f:a)^2.0 (f:b)
       df: 2     df:2
      
      c)
      Blended(f:a f:b^0.66) Blended (f:a^0.75)
              df: 3                  df: 2
      gets rewritten to:
      (f:a)^1.75 (f:b)^0.66
       df:?       df:2
      

      with ? either 2 or 3, depending on the run.

       

      Attachments

        1. LUCENE-9269-test.patch
          3 kB
          Michele Palmia

        Activity

          People

            Unassigned Unassigned
            micpalmia Michele Palmia
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: