[LUCENE-3290] add FieldInvertState.numUniqueTerms, Terms.sumDocFreq - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.4, 4.0-ALPHA
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

For scoring systems like lnu.ltc (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to supply 3 stats:

average tf within d
# of unique terms within d
average number of unique terms across field

If we add FieldInvertState.numUniqueTerms, you can incorporate the first two into your norms/docvalues (once we cut over),
the average tf within d being length / numUniqueTerms.

to compute the average across the field, we can just write the sum of all terms' docfreqs into the terms dictionary header,
and you can then divide this by maxdoc to get the average.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-3290.patch
08/Jul/11 11:25
28 kB
Robert Muir
LUCENE-3290.patch
08/Jul/11 02:27
28 kB
Robert Muir

Activity

People

Assignee:: Robert Muir

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Jul/11 02:25

Updated:: 28/Aug/22 12:52

Resolved:: 09/Jul/11 11:18