Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1321

Highlight fragment does not extend to maxDocCharsToAnalyze

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: None
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The current highlighter code checks whether the total length of the text to highlight is strictly smaller than maxDocCharsToAnalyze before adding any text remaining after the last token to the fragment. This means that if maxDocCharsToAnalyse is set to exactly the length of the text and the last token of the text is the term to highlight and is followed by non-token text, this non-token text will not be highlighted.

      For example, consider the phrase "this is a text with searchterm in it". "In" and "it" are not tokenized because they're stopwords. Setting maxDocCharsToAnalyze to 36 (the length of the sentence) and searching for "searchterm" gives a fragment ending in "searchterm". The expected behaviour is to have "in it" at the end of the fragment, since maxDocCharsToAnalyse explicitely states that the whole phrase should be considered.

      1. LUCENE-1321.patch
        2 kB
        Lars Kotthoff

        Activity

        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Thanks Lars. Nice catch - not an easy spot <g> Looks good to me. When I get a few free minutes I'll go over it a bit more, but on first inspection, certainly looks like the right fix and all tests pass.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Thanks Lars. Nice catch - not an easy spot <g> Looks good to me. When I get a few free minutes I'll go over it a bit more, but on first inspection, certainly looks like the right fix and all tests pass.
        Hide
        larsko Lars Kotthoff added a comment -

        Patch changing "text.length()< maxDocCharsToAnalyze" to "text.length()<= maxDocCharsToAnalyze" and adding a unit test to verify this behaviour.

        Show
        larsko Lars Kotthoff added a comment - Patch changing "text.length()< maxDocCharsToAnalyze" to "text.length()<= maxDocCharsToAnalyze" and adding a unit test to verify this behaviour.

          People

          • Assignee:
            markrmiller@gmail.com Mark Miller
            Reporter:
            larsko Lars Kotthoff
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development