Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8344

TokenStreamToAutomaton doesn't ignore trailing posInc when preservePositionIncrements=false

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.4
    • Component/s: modules/suggest
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      TokenStreamToAutomaton in Lucene core is used by the AnalyzingSuggester (incl. FuzzySuggester subclass ) and NRT Document Suggester and soon the SolrTextTagger.  It has a setting preservePositionIncrements defaulting to true. If it's set to false (e.g. to ignore stopwords) and if there is a trailing position increment greater than 1, TS2A will still add position increments (holes) into the automata even though it was configured not to.

      I'm filing this issue separate from LUCENE-8332 where I first found it. The fix is very simple but I'm concerned about back-compat ramifications so I'm filing it separately. I'll attach a patch to show the problem.

        Attachments

        1. LUCENE-8344.patch
          16 kB
          David Smiley
        2. LUCENE-8344.patch
          7 kB
          David Smiley
        3. LUCENE-8344.patch
          5 kB
          David Smiley

          Activity

            People

            • Assignee:
              dsmiley David Smiley
              Reporter:
              dsmiley David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: