Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4955

NGramTokenFilter increments positions for each gram

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.3
    • 4.4, 6.0
    • modules/analysis
    • None
    • New

    Description

      NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches.

      I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter.

      Attachments

        1. LUCENE-4955.patch
          12 kB
          Simon Willnauer
        2. LUCENE-4955.patch
          42 kB
          Adrien Grand
        3. highlighter-test.patch
          3 kB
          Simon Willnauer
        4. highlighter-test.patch
          4 kB
          Simon Willnauer

        Issue Links

          Activity

            People

              jpountz Adrien Grand
              simonw Simon Willnauer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: