Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8011

Improve similarity explanations

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.0
    • Component/s: None
    • Labels:
    • Lucene Fields:
      New

      Description

      LUCENE-7997 improves BM25 and Classic explains to better explain:

      product of:
        2.2 = scaling factor, k1 + 1
        9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
          1.0 = n, number of documents containing term
          17927.0 = N, total number of documents with field
        0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
          979.0 = freq, occurrences of term within document
          1.2 = k1, term saturation parameter
          0.75 = b, length normalization parameter
          1.0 = dl, length of field
          1.0 = avgdl, average length of field
      

      Previously it was pretty cryptic and used confusing terminology like docCount/docFreq without explanation:

      product of:
        0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
          449.0 = docFreq
          456.0 = docCount
        2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
          113659.0 = freq=113658
          1.2 = parameter k1
          0.75 = parameter b
          2300.5593 = avgFieldLength
          1048600.0 = fieldLength
      

      We should fix other similarities too in the same way, they should be more practical.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rcmuir Robert Muir
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: