Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1822

FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 4.1, 6.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Environment:

      any

    • Lucene Fields:
      New, Patch Available

      Description

      The new FastVectorHighlighter performs extremely well, however I've found in testing that the window of text chosen per fragment is often very poor, as it is hard coded in SimpleFragListBuilder to always select starting 6 characters to the left of the first phrase match in a fragment. When selecting long fragments, this often means that there is barely any context before the highlighted word, and lots after; even worse, when highlighting a phrase at the end of a short text the beginning is cut off, even though the entire phrase would fit in the specified fragCharSize. For example, highlighting "Punishment" in "Crime and Punishment" returns "e and <b>Punishment</b>" no matter what fragCharSize is specified. I am going to attach a patch that improves the text window selection by recalculating the starting margin once all phrases in the fragment have been identified - this way if a single word is matched in a fragment, it will appear in the middle of the highlight, instead of 6 characters from the beginning. This way one can also guarantee that the entirety of short texts are represented in a fragment by specifying a large enough fragCharSize.

        Attachments

        1. LUCENE-1822.patch
          9 kB
          Koji Sekiguchi
        2. LUCENE-1822.patch
          2 kB
          Koji Sekiguchi
        3. LUCENE-1822.patch
          2 kB
          Alex Vigdor
        4. LUCENE-1822-tests.patch
          7 kB
          Arcadius Ahouansou

          Issue Links

            Activity

              People

              • Assignee:
                koji Koji Sekiguchi
                Reporter:
                alexvigdor Alex Vigdor
              • Votes:
                4 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: