[LUCENE-9269] Blended queries with boolean rewrite can result in inconsistent scores - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 8.4
Fix Version/s: None
Component/s: core/search
Labels:
None

Lucene Fields:

New

Description

If two blended queries are should clauses of a boolean query and are built so that

some of their terms are the same
their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:

if the overlapping terms are not boosted, the df of the term in the first blended query is used
if any of the overlapping terms is boosted, the df is picked at (what looks like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).

a)
Blended(f:a f:b) Blended (f:a)
        df: 3             df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3      df:2

b)
Blended(f:a) Blended(f:a f:b)
        df: 2        df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2     df:2

c)
Blended(f:a f:b^0.66) Blended (f:a^0.75)
        df: 3                  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?       df:2

with ? either 2 or 3, depending on the run.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-9269-test.patch
10/Mar/20 12:16
3 kB
Michele Palmia

Activity

People

Assignee:: Unassigned

Reporter:: Michele Palmia

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Mar/20 12:12

Updated:: 28/Aug/22 15:59