Lucene - Core
  1. Lucene - Core
  2. LUCENE-2910

Highlighter does not correctly highlight the phrase around 50th term

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Trivial Trivial
    • Resolution: Unresolved
    • Affects Version/s: 2.9.4
    • Fix Version/s: None
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      When you use the Highlighter combined with N-Gram tokenizers such as CJKTokenizer and try to highlight the phrase that appears around 50th term in the field, the highlighted phrase is shorter than expected.

      e.g. Highlighting "fooo" in the following text with bigram tokenizer:
      "0---------1---------2---------3---------4---------fooo---"
      
      Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
      Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"
      
      1. HighlighterFix.patch
        3 kB
        Shinya Kasatani

        Activity

        Shinya Kasatani created issue -
        Shinya Kasatani made changes -
        Field Original Value New Value
        Attachment HighlighterFix.patch [ 12470425 ]
        Shinya Kasatani made changes -
        Description When you use the Highlighter combined with N-Gram tokenizers such as CJKTokenizer and try to highlight the phrase that appears around 50th term in the field, the highlighted phrase is shorter than expected.

        e.g. Highlighting "fooo" in the following text with bigram tokenizer:
        "0---------1---------2---------3---------4---------fooo---"

        Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
        Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"
        When you use the Highlighter combined with N-Gram tokenizers such as CJKTokenizer and try to highlight the phrase that appears around 50th term in the field, the highlighted phrase is shorter than expected.

        {noformat}
        e.g. Highlighting "fooo" in the following text with bigram tokenizer:
        "0---------1---------2---------3---------4---------fooo---"

        Expected: "0---------1---------2---------3---------4---------<B>fooo</B>---"
        Actual: "0---------1---------2---------3---------4---------f<B>ooo</B>---"
        {noformat}
        Mark Thomas made changes -
        Workflow jira [ 12544728 ] Default workflow, editable Closed status [ 12562176 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12562176 ] jira [ 12583188 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Shinya Kasatani
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development