Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8025

compute avgdl correctly for DOCS_ONLY

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.2, 8.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff of LUCENE-8007:

      If you omit term frequencies, we should score as if all tf values were 1. This is the way it worked for e.g. ClassicSimilarity and you can understand how it degrades.

      However for sims such as BM25, we bail out on computing avg doclength (and just return a bogus value of 1) today, screwing up stuff related to length normalization too, which is separate.

      Instead of a bogus value, we should substitute sumDocFreq for sumTotalTermFreq (all postings have freq of 1, since you omitted them).

        Attachments

        1. LUCENE-8025.patch
          4 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: