[LUCENE-3174] Similarity.Stats class for term & collection statistics - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: flexscoring branch
Fix Version/s: flexscoring branch
Component/s: core/search
Labels:
None

Lucene Fields:

New, Patch Available

Description

In order to support ranking methods besides TF-IDF, we need to make the statistics they need available. These statistics could be computed in computeWeight (soon to become computeStats) and stored in a separate object for easy access. Since this object will be used solely by subclasses of Similarity, it should be implented as a static inner class, i.e. Similarity.Stats.

There are two ways this could be implemented:

as a single Similarity.Stats class, reused by all ranking algorithms. In this case, this class would have a member field for all statistics;
as a hierarchy of Stats classes, one for each ranking algorithm. Each subclass would define only the statistics needed for the ranking algorithm.

In the second case, the Stats class in DefaultSimilarity would have a single field, idf, while the one in e.g. BM25Similarity would have idf and average field/document length.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-3174.patch
07/Jun/11 12:52
14 kB
David Mark Nemeskey
LUCENE-3174.patch
11/Jun/11 10:12
14 kB
David Mark Nemeskey
LUCENE-3174.patch
13/Jun/11 13:57
42 kB
David Mark Nemeskey
LUCENE-3174.patch
16/Jun/11 06:58
62 kB
Robert Muir
LUCENE-3174.patch
16/Jun/11 07:34
63 kB
Robert Muir
LUCENE-3174.patch
16/Jun/11 08:10
66 kB
Robert Muir
LUCENE-3174_normalize_boost.patch
11/Jun/11 18:50
13 kB
Robert Muir

Activity

People

Assignee:: David Mark Nemeskey

Reporter:: David Mark Nemeskey

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 05/Jun/11 15:21

Updated:: 28/Aug/22 12:49

Resolved:: 20/Jun/11 11:44