Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8344

TokenStreamToAutomaton doesn't ignore trailing posInc when preservePositionIncrements=false

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.4
    • modules/suggest
    • None
    • New

    Description

      TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the SolrTextTagger.  It has a setting preservePositionIncrements defaulting to true. If it's set to false (e.g. to ignore stopwords) and if there is a trailing position increment greater than 1, TS2A will still add position increments (holes) into the automata even though it was configured not to.

      I'm filing this issue separate from LUCENE-8332 where I first found it. The fix is very simple but I'm concerned about back-compat ramifications so I'm filing it separately. I'll attach a patch to show the problem.

      Attachments

        1. LUCENE-8344.patch
          16 kB
          David Smiley
        2. LUCENE-8344.patch
          7 kB
          David Smiley
        3. LUCENE-8344.patch
          5 kB
          David Smiley

        Activity

          People

            dsmiley David Smiley
            dsmiley David Smiley
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: