Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4955

NGramTokenFilter increments positions for each gram

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.3
    • Fix Version/s: 4.4, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches.

      I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter.

        Attachments

        1. highlighter-test.patch
          4 kB
          Simon Willnauer
        2. highlighter-test.patch
          3 kB
          Simon Willnauer
        3. LUCENE-4955.patch
          42 kB
          Adrien Grand
        4. LUCENE-4955.patch
          12 kB
          Simon Willnauer

          Issue Links

            Activity

              People

              • Assignee:
                jpountz Adrien Grand
                Reporter:
                simonw Simon Willnauer
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: