The Suggester's FSTLookup implementation discretizes the term frequencies into a configurable number of buckets (configurable as "weightBuckets") in order to deal with FST limitations. The mapping of a source frequency into a bucket is a proportional (i.e. linear) mapping from the minimum and maximum value. I don't think this makes sense at all given the well-known long-tail like distribution of term frequencies. As a result of this problem, I've found it necessary to increase weightBuckets substantially, like >100, to get quality suggestions.