Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10236

CombinedFieldsQuery to use fieldAndWeights.values() when constructing MultiNormsLeafSimScorer for scoring

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 9.1
    • modules/sandbox
    • None
    • New

    Description

      This is a spin-off issue from discussion in https://github.com/apache/lucene/pull/418#issuecomment-967790816, for a quick fix in CombinedFieldsQuery scoring.

      Currently CombinedFieldsQuery would use a constructed fields object to create a MultiNormsLeafSimScorer for scoring, but the fields object may contain duplicated field-weight pairs as it is built from looping over fieldTerms, resulting into duplicated norms being added during scoring calculation in MultiNormsLeafSimScorer. 

      E.g. for CombinedFieldsQuery with two fields and two values matching a particular doc:

      CombinedFieldQuery query =
          new CombinedFieldQuery.Builder()
              .addField("field1", (float) 1.0)
              .addField("field2", (float) 1.0)
              .addTerm(new BytesRef("foo"))
              .addTerm(new BytesRef("zoo"))
              .build(); 

      I would imagine the scoring to be based on the following:

      1. Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo)
      2. Sum of norms on doc = norm(field1) + norm(field2)

      but the current logic would use the following for scoring:

      1. Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo)
      2. Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + norm(field2)

       

      In addition, this differs from how MultiNormsLeafSimScorer is constructed from CombinedFieldsQuery explain function, which uses fieldAndWeights.values() and does not contain duplicated field-weight pairs. 

      Attachments

        Activity

          People

            zacharymorn Zach Chen
            zacharymorn Zach Chen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7h
                7h