Description
LUCENE-7997 improves BM25 and Classic explains to better explain:
product of: 2.2 = scaling factor, k1 + 1 9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: 1.0 = n, number of documents containing term 17927.0 = N, total number of documents with field 0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from: 979.0 = freq, occurrences of term within document 1.2 = k1, term saturation parameter 0.75 = b, length normalization parameter 1.0 = dl, length of field 1.0 = avgdl, average length of field
Previously it was pretty cryptic and used confusing terminology like docCount/docFreq without explanation:
product of: 0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 449.0 = docFreq 456.0 = docCount 2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from: 113659.0 = freq=113658 1.2 = parameter k1 0.75 = parameter b 2300.5593 = avgFieldLength 1048600.0 = fieldLength
We should fix other similarities too in the same way, they should be more practical.
Attachments
Issue Links
- links to