Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13103

UnifiedHighlighter Separator-based BreakIterator should work with Strings, not just a single character

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: highlighter
    • Labels:
      None

      Description

      For the `hl.bs.type` choice of SEPARATOR, it would be nice if we could support not just a single character, but a string.  In looking at the code, I see no reason Strings can't be supported other than a few signature changes on some constructors.

       

      My use case: I have docs that I have section and page markers that make for conveniently-sized passages for highlighting, but there really isn't any clean way to mark those sections with a single character.  For instance, Tika will extract and mark pages with `<div class="page"><p/>....</div>`.  If I could pass in that `<div class="page">` tag as my separator, I could then just highlight within a page.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gsingers Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: