Thanks Michael McCandless for the review!
In collect, we seem to assume the suggest searcher will never call
collect more than num times? How is that? If so, can you add that to
the javadocs, and maybe add an assert upto < num in collect?
Can we just allocate scoreDocs up front instead of lazily?
In the javadocs, instead of "one hit can be..." maybe "one doc can
be..."? Hit is a tricky word in this context since it could be a doc
or a suggestion...
I have re written TopSuggestDocsCollector to have a priority queue at the top-level instead, somewhat similar to TopDocsCollector.
Now completions across segments are collected in the same pq, this allows early termination for suggesters at the segment level
(when a collected completion overflows the pq, we can disregard the rest of the completions for that segment,
as completions are collected in order of their scores).
In SuggestIndexSearcher, does it really ever make sense to take a
generic Collector/LeafCollector? Can we instead just strongly type
the params to all the methods to be TopSuggestDocsCollector?
Thanks for the suggestion! the generic Collector/LeafCollector is removed.
public void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector)
"In case a filter has to be applied, the queue size is doubled" is not
quite correct? Maybe change the logic there so the int queueSize is
first computed, and then if filter is enabled, it's doubled?
Now the queueSize is increased by half the # of live docs in the segment instead. If a filter is applied, the queue size should
be increased w.r.t. to # of documents.
if the applied filter filters out <= half of the top scoring documents for a query prefix, then the search is admissible.
if a filter is too restrictive, then the search is inadmissible. a work around would be to multiply num by some factor,
in this case early termination might help (if TopSuggestDocsCollector is initialized with the original num). thoughts?
- SuggestIndexSearcher cleanup
- TopSuggestDocsCollector re-write
- remove WeightProcessor from NRTSuggester
- added more tests (including boundary cases for deleted/filtered out documents)