Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
3.4
-
None
Description
The Suggester's FSTLookup implementation discretizes the term frequencies into a configurable number of buckets (configurable as "weightBuckets") in order to deal with FST limitations. The mapping of a source frequency into a bucket is a proportional (i.e. linear) mapping from the minimum and maximum value. I don't think this makes sense at all given the well-known long-tail like distribution of term frequencies. As a result of this problem, I've found it necessary to increase weightBuckets substantially, like >100, to get quality suggestions.
Attachments
Issue Links
- is blocked by
-
SOLR-2762 FSTLookup returns one less suggestion than it should when onlyMorePopular=true
- Closed
- is part of
-
SOLR-2888 FSTSuggester refactoring: utf8 storage, external sorts (OOM prevention), code cleanups
- Closed
- is related to
-
LUCENE-3714 add suggester that uses shortest path/wFST instead of buckets
- Closed