Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8025

compute avgdl correctly for DOCS_ONLY

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.2, 8.0
    • None
    • None
    • New

    Description

      Spinoff of LUCENE-8007:

      If you omit term frequencies, we should score as if all tf values were 1. This is the way it worked for e.g. ClassicSimilarity and you can understand how it degrades.

      However for sims such as BM25, we bail out on computing avg doclength (and just return a bogus value of 1) today, screwing up stuff related to length normalization too, which is separate.

      Instead of a bogus value, we should substitute sumDocFreq for sumTotalTermFreq (all postings have freq of 1, since you omitted them).

      Attachments

        1. LUCENE-8025.patch
          4 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: