Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New
Description
TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the SolrTextTagger. It has a setting preservePositionIncrements defaulting to true. If it's set to false (e.g. to ignore stopwords) and if there is a trailing position increment greater than 1, TS2A will still add position increments (holes) into the automata even though it was configured not to.
I'm filing this issue separate from LUCENE-8332 where I first found it. The fix is very simple but I'm concerned about back-compat ramifications so I'm filing it separately. I'll attach a patch to show the problem.