[LUCENE-6986] Add more DFI independence measures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.5, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Since ~~LUCENE-6818~~ we have DFISimilarity which implements normalized chi-squared distance.

But there are other alternatives (as described in http://trec.nist.gov/pubs/trec21/papers/irra.web.nb.pdf):

normalized chi-squared: "can be used for tasks that require high precision, against both short and long queries"
standardized: "good at tasks that require high recall and high precision, especially against short queries composed of a few words as in the case of Internet searches"
saturated: "for tasks that require high recall against long queries"

I think we should just provide the three independence measures, and let the user choose. Similar to how we do DFR/IB/etc.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6986.patch
21/Jan/16 06:28
18 kB
Robert Muir
LUCENE-6986.patch
21/Jan/16 05:47
18 kB
Robert Muir
LUCENE-6986.patch
21/Jan/16 05:43
19 kB
Robert Muir

Activity

People

Assignee:: Unassigned

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 21/Jan/16 05:43

Updated:: 28/Aug/22 14:48

Resolved:: 22/Jan/16 13:55