Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9093

Unified highlighter with word separator never gives context to the left

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.5
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When using the unified highlighter with hl.bs.type=WORD, I am not able to get context to the left of the matches returned; only words to the right of each match are shown. I see this behaviour on both Solr 6.4 and Solr 7.1.

      Without context to the left of a match, the highlighted snippets are much less useful for understanding where the match appears in a document.

      As an example, using the techproducts data with Solr 7.1, given a search for "apple", highlighting the "features" field:

      http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified

      I see this snippet:

      "<em>Apple</em> Lossless, H.264 video"

      Note that "Apple" is anchored to the left. Compare with the original highlighter:

      http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30

      And the match has context either side:

      ", Audible, <em>Apple</em> Lossless, H.264 video"

      (To complicate this, in general I am not sure that the unified highlighter is respecting the hl.fragsize parameter, although SOLR-9935 suggests support was added. I included the hl.fragsize param in the unified URL too, but it's making no difference unless set to 0.)

        Attachments

        1. LUCENE-9093.patch
          22 kB
          Nándor Mátravölgyi

          Issue Links

            Activity

              People

              • Assignee:
                dsmiley David Smiley
                Reporter:
                timretout Tim Retout
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 8h 10m
                  8h 10m