Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13103

UnifiedHighlighter Separator-based BreakIterator should work with Strings, not just a single character

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • highlighter
    • None

    Description

      For the `hl.bs.type` choice of SEPARATOR, it would be nice if we could support not just a single character, but a string.  In looking at the code, I see no reason Strings can't be supported other than a few signature changes on some constructors.

       

      My use case: I have docs that I have section and page markers that make for conveniently-sized passages for highlighting, but there really isn't any clean way to mark those sections with a single character.  For instance, Tika will extract and mark pages with `<div class="page"><p/>....</div>`.  If I could pass in that `<div class="page">` tag as my separator, I could then just highlight within a page.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gsingers Grant Ingersoll
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: