Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2887

FSTSuggester shouldn't OOM on large inputs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 3.5, 4.0-ALPHA
    • spellchecker
    • None

    Description

      Currently the input to FSTSuggester needs to be re-sorted and this is done in-memory. Kind of defeats the purpose of the component since everything else is super-efficient but we don't even get to that part because of OOMs during construction.

      Robert suggested using a spill-to-disk and merge sort on-disk. I suggested creating a lucene index and then enumerating terms for automaton construction or taking the automaton directly from the index structure (if it isn't pruned).

      Attachments

        Issue Links

          Activity

            People

              dweiss Dawid Weiss
              dweiss Dawid Weiss
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: