Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3714

add suggester that uses shortest path/wFST instead of buckets

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.6, 4.0-ALPHA
    • modules/spellchecker
    • None
    • New

    Description

      Currently the FST suggester (really an FSA) quantizes weights into buckets (e.g. single byte) and puts them in front of the word.
      This makes it fast, but you lose granularity in your suggestions.

      Lately the question was raised, if you build lucene's FST with positiveintoutputs, does it behave the same as a tropical semiring wFST?

      In other words, after completing the word, we instead traverse min(output) at each node to find the 'shortest path' to the
      best suggestion (with the highest score).

      This means we wouldnt need to quantize weights at all and it might make some operations (e.g. adding fuzzy matching etc) a lot easier.

      Attachments

        1. LUCENE-3714.patch
          38 kB
          Robert Muir
        2. LUCENE-3714.patch
          28 kB
          Robert Muir
        3. LUCENE-3714.patch
          31 kB
          Robert Muir
        4. LUCENE-3714.patch
          31 kB
          Robert Muir
        5. LUCENE-3714.patch
          18 kB
          Michael McCandless
        6. LUCENE-3714.patch
          7 kB
          Robert Muir
        7. out.png
          63 kB
          Dawid Weiss
        8. TestMe.java
          4 kB
          Dawid Weiss

        Issue Links

          Activity

            People

              rcmuir Robert Muir
              rcmuir Robert Muir
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: