[LUCENE-8025] compute avgdl correctly for DOCS_ONLY - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.2, 8.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Spinoff of ~~LUCENE-8007~~:

If you omit term frequencies, we should score as if all tf values were 1. This is the way it worked for e.g. ClassicSimilarity and you can understand how it degrades.

However for sims such as BM25, we bail out on computing avg doclength (and just return a bogus value of 1) today, screwing up stuff related to length normalization too, which is separate.

Instead of a bogus value, we should substitute sumDocFreq for sumTotalTermFreq (all postings have freq of 1, since you omitted them).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-8025.patch
31/Oct/17 02:36
4 kB
Robert Muir

Activity

People

Assignee:: Unassigned

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/Oct/17 02:26

Updated:: 28/Aug/22 15:21

Resolved:: 01/Nov/17 23:46