Solr
  1. Solr
  2. SOLR-2761

FSTLookup should use long-tail like discretization instead of proportional (linear)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: 3.4
    • Fix Version/s: 3.5, 3.6, 4.0-ALPHA
    • Component/s: spellchecker
    • Labels:
      None

      Description

      The Suggester's FSTLookup implementation discretizes the term frequencies into a configurable number of buckets (configurable as "weightBuckets") in order to deal with FST limitations. The mapping of a source frequency into a bucket is a proportional (i.e. linear) mapping from the minimum and maximum value. I don't think this makes sense at all given the well-known long-tail like distribution of term frequencies. As a result of this problem, I've found it necessary to increase weightBuckets substantially, like >100, to get quality suggestions.

        Issue Links

          Activity

          David Smiley created issue -
          Dawid Weiss made changes -
          Field Original Value New Value
          Assignee Dawid Weiss [ dweiss ]
          Dawid Weiss made changes -
          Link This issue is blocked by SOLR-2762 [ SOLR-2762 ]
          Dawid Weiss made changes -
          Link This issue is part of SOLR-2888 [ SOLR-2888 ]
          Dawid Weiss made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 3.6 [ 12319065 ]
          Fix Version/s 4.0 [ 12314992 ]
          Resolution Invalid [ 6 ]
          Dawid Weiss made changes -
          Resolution Invalid [ 6 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Dawid Weiss made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Dawid Weiss made changes -
          Link This issue is related to LUCENE-3714 [ LUCENE-3714 ]

            People

            • Assignee:
              Dawid Weiss
              Reporter:
              David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development