Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2910

Highlighter does not correctly highlight the phrase around 50th term

Details

    • Bug
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • 2.9.4
    • None
    • modules/highlighter
    • None
    • New, Patch Available

    Description

      When you use the Highlighter combined with N-Gram tokenizers such as CJKTokenizer and try to highlight the phrase that appears around 50th term in the field, the highlighted phrase is shorter than expected.

      e.g. Highlighting "fooo" in the following text with bigram tokenizer:
      "0---------1---------2---------3---------4---------fooo---"
      
      Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
      Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"
      

      Attachments

        1. HighlighterFix.patch
          3 kB
          Shinya Kasatani

        Activity

          People

            Unassigned Unassigned
            shinya Shinya Kasatani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: