Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3714

add suggester that uses shortest path/wFST instead of buckets

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: modules/spellchecker
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently the FST suggester (really an FSA) quantizes weights into buckets (e.g. single byte) and puts them in front of the word.
      This makes it fast, but you lose granularity in your suggestions.

      Lately the question was raised, if you build lucene's FST with positiveintoutputs, does it behave the same as a tropical semiring wFST?

      In other words, after completing the word, we instead traverse min(output) at each node to find the 'shortest path' to the
      best suggestion (with the highest score).

      This means we wouldnt need to quantize weights at all and it might make some operations (e.g. adding fuzzy matching etc) a lot easier.

        Attachments

        1. LUCENE-3714.patch
          38 kB
          Robert Muir
        2. LUCENE-3714.patch
          28 kB
          Robert Muir
        3. LUCENE-3714.patch
          31 kB
          Robert Muir
        4. LUCENE-3714.patch
          31 kB
          Robert Muir
        5. LUCENE-3714.patch
          18 kB
          Michael McCandless
        6. LUCENE-3714.patch
          7 kB
          Robert Muir
        7. out.png
          63 kB
          Dawid Weiss
        8. TestMe.java
          4 kB
          Dawid Weiss

          Issue Links

            Activity

              People

              • Assignee:
                rcmuir Robert Muir
                Reporter:
                rcmuir Robert Muir
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: