Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2888

FSTSuggester refactoring: utf8 storage, external sorts (OOM prevention), code cleanups

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.6, 4.0-ALPHA
    • spellchecker
    • None

    Description

      This issue incorporates several problems:

      • utf16 was used previously to store and lookup terms, now it is utf8
      • the construction would OOM with large number of terms because of the need to sort entries. Sorter APIs have been added and an implementation of external (on-disk) sorting is also added (Robert Muir).
      • the FSTLookup class has been split and refactored into FSTCompletion and FSTCompletionBuilder, FSTCompletionLookup remains a facade connecting all the pieces and implements Lookup interface. For large inputs use FSTCompletionBuilder directly (and pre-bucket your input weights).
      • Automatic bucketing in FSTCompletionLookup has been changed from linear min/max discretization into dividing into ranges after all values have been sorted. This empirically handles all potential distributions quite well. If somebody needs something very specific, use FSTCompletionBuilder directly (providing buckets), construct the automaton and then load it with FSTCompletionLookup.

      Attachments

        1. SOLR-2888.patch
          103 kB
          Dawid Weiss
        2. SOLR-2888.patch
          101 kB
          Dawid Weiss
        3. SOLR-2888.patch
          107 kB
          Dawid Weiss
        4. SOLR-2888_backport.patch
          12 kB
          Robert Muir

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rcmuir Robert Muir
            dweiss Dawid Weiss
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment