[LUCENE-3714] add suggester that uses shortest path/wFST instead of buckets - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.6, 4.0-ALPHA
Component/s: modules/spellchecker
Labels:
None

Lucene Fields:

New

Description

Currently the FST suggester (really an FSA) quantizes weights into buckets (e.g. single byte) and puts them in front of the word.
This makes it fast, but you lose granularity in your suggestions.

Lately the question was raised, if you build lucene's FST with positiveintoutputs, does it behave the same as a tropical semiring wFST?

In other words, after completing the word, we instead traverse min(output) at each node to find the 'shortest path' to the
best suggestion (with the highest score).

This means we wouldnt need to quantize weights at all and it might make some operations (e.g. adding fuzzy matching etc) a lot easier.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-3714.patch
19/Feb/12 05:49
38 kB
Robert Muir
LUCENE-3714.patch
30/Jan/12 00:32
28 kB
Robert Muir
LUCENE-3714.patch
26/Jan/12 21:07
31 kB
Robert Muir
LUCENE-3714.patch
26/Jan/12 03:58
31 kB
Robert Muir
TestMe.java
22/Jan/12 20:04
4 kB
Dawid Weiss
LUCENE-3714.patch
22/Jan/12 19:56
18 kB
Michael McCandless
out.png
21/Jan/12 20:25
63 kB
Dawid Weiss
LUCENE-3714.patch
21/Jan/12 19:14
7 kB
Robert Muir

Issue Links

relates to

SOLR-2761 FSTLookup should use long-tail like discretization instead of proportional (linear)

Closed

Activity

People

Assignee:: Robert Muir

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 21/Jan/12 19:12

Updated:: 28/Aug/22 13:06

Resolved:: 19/Feb/12 17:19