[LUCENE-2959] [GSoC] Implementing State of the Art Ranking for Lucene - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0-ALPHA, flexscoring branch
Component/s: core/query/scoring, general/javadocs, modules/examples
Labels:

Lucene Fields:

New

Description

Lucene employs the Vector Space Model (VSM) to rank documents, which compares
unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is
tailored specically to VSM, which makes the addition of new ranking functions a non-
trivial task.

This project aims to bring state of the art ranking methods to Lucene and to implement a
query architecture with pluggable ranking functions.

The wiki page for the project can be found at http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

implementation_plan.pdf
10/Mar/11 10:51
49 kB
David Mark Nemeskey
LUCENE-2959_mockdfr.patch
30/Mar/11 15:24
8 kB
Robert Muir
LUCENE-2959_nocommits.patch
07/Sep/11 05:40
22 kB
Robert Muir
LUCENE-2959.patch
09/Sep/11 16:45
435 kB
Robert Muir
LUCENE-2959.patch
09/Sep/11 16:42
539 kB
Robert Muir
proposal.pdf
10/Mar/11 10:50
85 kB
David Mark Nemeskey

Issue Links

is related to

LUCENE-6818 Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

Closed

relates to

LUCENE-2091 Add BM25 Scoring to Lucene

Closed

LUCENE-2392 Enable flexible scoring

Closed

Sub-Tasks

1.

Rename some classes and methods in the search package so their names describe their roles

Open

David Mark Nemeskey

2.

Similarity.Stats class for term & collection statistics

Resolved

David Mark Nemeskey

3.

Implement various ranking models as Similarities

Resolved

David Mark Nemeskey

0%

Original Estimate - 336h

Remaining Estimate - 336h

4.

Unit and integration test cases for the new Similarities

Resolved

David Mark Nemeskey

5.

EasySimilarity to interpret document length as float

Resolved

David Mark Nemeskey

0%

Original Estimate - 1h

Remaining Estimate - 1h

6.

Integrate MockBM25Similarity and MockLMSimilarity into the framework

Resolved

David Mark Nemeskey

0%

Original Estimate - 4h

Remaining Estimate - 4h

7.

Get javadoc for the similarities package in shape

Resolved

David Mark Nemeskey

8.

Make EasySimilarityProvider a full-fledged class

Resolved

David Mark Nemeskey

0%

Original Estimate - 1h

Remaining Estimate - 1h

9.

Rename EasySimilarity to SimilarityBase

Resolved

David Mark Nemeskey

0%

Original Estimate - 1h

Remaining Estimate - 1h

Activity

People

Assignee:: Robert Muir

Reporter:: David Mark Nemeskey

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Mar/11 10:49

Updated:: 28/Aug/22 12:42

Resolved:: 11/Sep/11 15:49

Time Tracking

Estimated:

343h

Remaining:

343h

Logged:

Not Specified

Include sub-tasks