Lucene - Core
  1. Lucene - Core
  2. LUCENE-3714

add suggester that uses shortest path/wFST instead of buckets

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: modules/spellchecker
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently the FST suggester (really an FSA) quantizes weights into buckets (e.g. single byte) and puts them in front of the word.
      This makes it fast, but you lose granularity in your suggestions.

      Lately the question was raised, if you build lucene's FST with positiveintoutputs, does it behave the same as a tropical semiring wFST?

      In other words, after completing the word, we instead traverse min(output) at each node to find the 'shortest path' to the
      best suggestion (with the highest score).

      This means we wouldnt need to quantize weights at all and it might make some operations (e.g. adding fuzzy matching etc) a lot easier.

      1. LUCENE-3714.patch
        7 kB
        Robert Muir
      2. out.png
        63 kB
        Dawid Weiss
      3. LUCENE-3714.patch
        18 kB
        Michael McCandless
      4. TestMe.java
        4 kB
        Dawid Weiss
      5. LUCENE-3714.patch
        31 kB
        Robert Muir
      6. LUCENE-3714.patch
        31 kB
        Robert Muir
      7. LUCENE-3714.patch
        28 kB
        Robert Muir
      8. LUCENE-3714.patch
        38 kB
        Robert Muir

        Issue Links

          Activity

            People

            • Assignee:
              Robert Muir
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development