Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2761

FSTLookup should use long-tail like discretization instead of proportional (linear)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 3.4
    • 3.5, 3.6, 4.0-ALPHA
    • spellchecker
    • None

    Description

      The Suggester's FSTLookup implementation discretizes the term frequencies into a configurable number of buckets (configurable as "weightBuckets") in order to deal with FST limitations. The mapping of a source frequency into a bucket is a proportional (i.e. linear) mapping from the minimum and maximum value. I don't think this makes sense at all given the well-known long-tail like distribution of term frequencies. As a result of this problem, I've found it necessary to increase weightBuckets substantially, like >100, to get quality suggestions.

      Attachments

        Issue Links

          Activity

            People

              dweiss Dawid Weiss
              dsmiley David Smiley
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: