[LUCENE-8633] Remove term weighting from interval scoring - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 8.0, 9.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

IntervalScorer currently uses the same scoring mechanism as SpanScorer, summing the IDF of all possibly matching terms from its parent IntervalsSource and using that in conjunction with a sloppy frequency to produce a similarity-based score. This doesn't really make sense, however, as it means that terms that don't appear in a document can still contribute to the score, and appears to make scores from interval queries comparable with scores from term or phrase queries when they really aren't.

I'd like to explore a different scoring mechanism for intervals, based purely on sloppy frequency and ignoring term weighting. This should make the scores easier to reason about, as well as making them useful for things like proximity boosting on boolean queries.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-8633.patch
11/Jan/19 10:22
27 kB
Alan Woodward
LUCENE-8633.patch
14/Jan/19 10:45
37 kB
Alan Woodward

Activity

People

Assignee:: Alan Woodward

Reporter:: Alan Woodward

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Jan/19 10:22

Updated:: 28/Aug/22 15:40

Resolved:: 16/Jan/19 14:11