Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
New
Description
This is a spin-off issue from discussion in https://github.com/apache/lucene/pull/418#issuecomment-967790816, for a quick fix in CombinedFieldsQuery scoring.
Currently CombinedFieldsQuery would use a constructed fields object to create a MultiNormsLeafSimScorer for scoring, but the fields object may contain duplicated field-weight pairs as it is built from looping over fieldTerms, resulting into duplicated norms being added during scoring calculation in MultiNormsLeafSimScorer.
E.g. for CombinedFieldsQuery with two fields and two values matching a particular doc:
CombinedFieldQuery query = new CombinedFieldQuery.Builder() .addField("field1", (float) 1.0) .addField("field2", (float) 1.0) .addTerm(new BytesRef("foo")) .addTerm(new BytesRef("zoo")) .build();
I would imagine the scoring to be based on the following:
- Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo)
- Sum of norms on doc = norm(field1) + norm(field2)
but the current logic would use the following for scoring:
- Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo)
- Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + norm(field2)
In addition, this differs from how MultiNormsLeafSimScorer is constructed from CombinedFieldsQuery explain function, which uses fieldAndWeights.values() and does not contain duplicated field-weight pairs.