[LUCENE-5030] FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.3
Fix Version/s: 4.5, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space.

This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST.

See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

benchmark-INFO_SEP.txt
21/Jun/13 07:48
4 kB
Artem Lukanin
benchmark-old.txt
21/Jun/13 07:48
4 kB
Artem Lukanin
benchmark-wo_convertion.txt
21/Jun/13 07:48
4 kB
Artem Lukanin
LUCENE-5030.patch
17/Jul/13 09:44
28 kB
Artem Lukanin
LUCENE-5030.patch
17/Jul/13 07:49
29 kB
Artem Lukanin
LUCENE-5030.patch
04/Jul/13 07:05
30 kB
Artem Lukanin
LUCENE-5030.patch
03/Jul/13 21:06
29 kB
Michael McCandless
LUCENE-5030.patch
02/Jul/13 13:25
178 kB
Artem Lukanin
LUCENE-5030.patch
27/Jun/13 10:45
175 kB
Artem Lukanin
nonlatin_fuzzySuggester_combo.patch
24/Jun/13 09:21
187 kB
Artem Lukanin
nonlatin_fuzzySuggester_combo1.patch
26/Jun/13 08:50
37 kB
Artem Lukanin
nonlatin_fuzzySuggester_combo2.patch
26/Jun/13 09:33
184 kB
Artem Lukanin
nonlatin_fuzzySuggester.patch
20/Jun/13 09:52
168 kB
Artem Lukanin
nonlatin_fuzzySuggester.patch
19/Jun/13 11:54
167 kB
Artem Lukanin
nonlatin_fuzzySuggester.patch
13/Jun/13 11:30
68 kB
Artem Lukanin
nonlatin_fuzzySuggester1.patch
17/Jun/13 08:51
169 kB
Artem Lukanin
nonlatin_fuzzySuggester2.patch
17/Jun/13 13:21
135 kB
Artem Lukanin
nonlatin_fuzzySuggester3.patch
17/Jun/13 13:42
117 kB
Artem Lukanin
nonlatin_fuzzySuggester4.patch
18/Jun/13 07:58
118 kB
Artem Lukanin
run-suggest-benchmark.patch
20/Jun/13 11:22
2 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Artem Lukanin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Jun/13 11:14

Updated:: 28/Aug/22 13:47

Resolved:: 18/Jul/13 14:43