Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9093

Unified highlighter with word separator never gives context to the left

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.5
    • modules/highlighter
    • None
    • New

    Description

      When using the unified highlighter with hl.bs.type=WORD, I am not able to get context to the left of the matches returned; only words to the right of each match are shown. I see this behaviour on both Solr 6.4 and Solr 7.1.

      Without context to the left of a match, the highlighted snippets are much less useful for understanding where the match appears in a document.

      As an example, using the techproducts data with Solr 7.1, given a search for "apple", highlighting the "features" field:

      http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified

      I see this snippet:

      "<em>Apple</em> Lossless, H.264 video"

      Note that "Apple" is anchored to the left. Compare with the original highlighter:

      http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30

      And the match has context either side:

      ", Audible, <em>Apple</em> Lossless, H.264 video"

      (To complicate this, in general I am not sure that the unified highlighter is respecting the hl.fragsize parameter, although SOLR-9935 suggests support was added. I included the hl.fragsize param in the unified URL too, but it's making no difference unless set to 0.)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dsmiley David Smiley
            timretout Tim Retout
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 8h 10m
                8h 10m

                Slack

                  Issue deployment