[LUCENE-736] Sloppy Phrase Scorer matches the doc "A B C D E" for query = "B C B"~2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2
Component/s: core/search
Labels:
None

Lucene Fields:

Patch Available

Description

This is an extension of https://issues.apache.org/jira/browse/LUCENE-697

In addition to abnormalities Yonik pointed out in 697, there seem to be other issues with slopy phrase search and scoring.

1) A phrase with a repeated word would be detected in a document although it is not there.
I.e. document = A B D C E , query = "B C B" would not find this document (as expected), but query "B C B"~2 would find it.
I think that no matter how large the slop is, this document should not be a match.

2) A document containing both orders of a query, symmetrically, would score differently for the queru and for its reveresed form.
I.e. document = A B C B A would score differently for queries "B C"~2 and "C B"~2, although it is symmetric to both.

I will attach test cases that show both these problems and the one reported by Yonik in 697.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

perf-search-new.log
01/Dec/06 10:24
184 kB
Doron Cohen
perf-search-orig.log
01/Dec/06 10:24
184 kB
Doron Cohen
res-search-new2.log
02/Dec/06 08:33
363 kB
Doron Cohen
res-search-orig2.log
02/Dec/06 08:33
364 kB
Doron Cohen
sloppy_phrase_java.patch.txt
01/Dec/06 10:24
12 kB
Doron Cohen
sloppy_phrase_tests.patch.txt
01/Dec/06 09:24
12 kB
Doron Cohen
sloppy_phrase.patch2.txt
02/Dec/06 08:33
25 kB
Doron Cohen
sloppy_phrase.patch3.txt
24/Apr/07 05:39
30 kB
Doron Cohen
sloppy_phrase.patch3.txt
04/Dec/06 21:49
27 kB
Doron Cohen

Issue Links

relates to

LUCENE-697 Scorer.skipTo affects sloppyPhrase scoring

Resolved

Activity

People

Assignee:: Doron Cohen

Reporter:: Doron Cohen

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 01/Dec/06 09:13

Updated:: 28/Aug/22 11:32

Resolved:: 24/Apr/07 05:37