My thinking was that the usual scenario is that you submit autosuggest queries soon after user starts typing the query, and the highest perceived value of such functionality is when it can suggest complete meaningful phrases and not just individual terms. I.e. when you start typing "token sug" it won't suggest "token sugar" but instead it will suggest "token suggestions".
Yes but the decision of selecting the complete phrase or an individual term should be up to the user. This is controlled by the "queryAnalyzerFieldType" in SpellCheckComponent. We will index tokens returned by that analyzer so the user can configure whichever behavior he wants. For example, if it is KeywordAnalyzer, we will index/suggest phrases and if it is a WhitespaceAnalyzer we will index/suggest individual terms.
Such as? What you put there is what you get so the fact that we are getting complete phrases as suggestions is the consequence of the choice above - the trie in this case is populated with phrases. If we populate it with tokens, then we can return per-token suggestions, again - losing the added value I mentioned above.
My point was that SpellingResult is too coarse. It is a complete result (for all tokens given by "queryAnalyzerFieldType"). If that analyzer gives us multiple tokens then we must get suggestions for each. In that case returning a SpellingResult for each token is not right. Instead the Suggestor should combine suggestions for all tokens into a SpellingResult object. I don't have a suggestion on an alternative. Looks like we may need to invent a custom type which represents the (suggestion, frequency) pair.
For now I'm sure that we do NOT want to use the impl. of RadixTree in this patch, because it doesn't support our use case - I'll prepare a patch that removes this impl. Other implementations seem comparable wrt. to the speed, based on casual tests using /usr/share/dict/words, but I didn't run any exact benchmarks yet.
OK. Go ahead with the patch and I'll try to find some time to compare the two methods. What about DAWGs? Are we still considering them?
Shouldn't we be creating a separate AutoSuggestComponent like the SpellCheckComponent havings its own prepare, process and inform functions?
We could do that but as Andrej noted, we'd end up re-implementing a lot of its functionality. I'm not sure if it is worth it. I agree that it'd be odd using parameters prefixed with "spellcheck" for auto-suggest and it'd have been easier if it were vice-versa. Does anybody have a suggestion?