[LUCENE-329] Fuzzy query scoring issues - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.2
Fix Version/s: 5.3, 6.0
Component/s: core/search
Labels:
None
Environment:

Operating System: All
Platform: All

Bugzilla Id:
32942

Description

Queries which automatically produce multiple terms (wildcard, range, prefix,
fuzzy etc)currently suffer from two problems:

1) Scores for matching documents are significantly smaller than term queries
because of the volume of terms introduced (A match on query Foo~ is 0.1
whereas a match on query Foo is 1).
2) The rarer forms of expanded terms are favoured over those of more common
forms because of the IDF. When using Fuzzy queries for example, rare mis-
spellings typically appear in results before the more common correct spellings.

I will attach a patch that corrects the issues identified above by
1) Overriding Similarity.coord to counteract the downplaying of scores
introduced by expanding terms.
2) Taking the IDF factor of the most common form of expanded terms as the
basis of scoring all other expanded terms.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-329.patch
05/May/15 12:39
11 kB
Mark Harwood
LUCENE-329.patch
12/May/15 10:33
12 kB
Mark Harwood
LUCENE-329.patch
19/May/15 10:21
13 kB
Mark Harwood
LUCENE-329.patch
19/May/15 11:52
13 kB
Mark Harwood
ASF.LICENSE.NOT.GRANTED--patch.txt
05/Jan/05 06:39
11 kB
Mark Harwood

Issue Links

is duplicated by

LUCENE-124 Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc

Closed

relates to

LUCENE-1424 Change all multi-term querys so that they extend MultiTermQuery and allow for a constant score mode

Closed

supercedes

LUCENE-6476 Split logic from TermContext.register

Closed

Activity

People

Assignee:: Mark Harwood

Reporter:: Mark Harwood

Votes:: 3 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 05/Jan/05 06:34

Updated:: 28/Aug/22 11:20

Resolved:: 20/May/15 14:54