Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13103

UnifiedHighlighter Separator-based BreakIterator should work with Strings, not just a single character

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • highlighter
    • None

    Description

      For the `hl.bs.type` choice of SEPARATOR, it would be nice if we could support not just a single character, but a string.  In looking at the code, I see no reason Strings can't be supported other than a few signature changes on some constructors.

       

      My use case: I have docs that I have section and page markers that make for conveniently-sized passages for highlighting, but there really isn't any clean way to mark those sections with a single character.  For instance, Tika will extract and mark pages with `<div class="page"><p/>....</div>`.  If I could pass in that `<div class="page">` tag as my separator, I could then just highlight within a page.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            gsingers Grant Ingersoll

            Dates

              Created:
              Updated:

              Slack

                Issue deployment