[LUCENE-7620] UnifiedHighlighter: add target character width BreakIterator wrapper - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.4
Component/s: modules/highlighter
Labels:
None

Lucene Fields:

New

Description

The original Highlighter includes a SimpleFragmenter that delineates fragments (aka Passages) by a character width. The default is 100 characters.

It would be great to support something similar for the UnifiedHighlighter. It's useful in its own right and of course it helps users transition to the UH. I'd like to do it as a wrapper to another BreakIterator – perhaps a sentence one. In this way you get back Passages that are a number of sentences so they will look nice instead of breaking mid-way through a sentence. And you get some control by specifying a target number of characters. This BreakIterator wouldn't be a general purpose java.text.BreakIterator since it would assume it's called in a manner exactly as the UnifiedHighlighter uses it. It would probably be compatible with the PostingsHighlighter too.

I don't propose doing this by default; besides, it's easy enough to pick your BreakIterator config.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE_7620_UH_LengthGoalBreakIterator.patch
07/Jan/17 05:10
13 kB
David Smiley
LUCENE_7620_UH_LengthGoalBreakIterator.patch
06/Jan/17 22:17
12 kB
David Smiley
LUCENE_7620_UH_LengthGoalBreakIterator.patch
06/Jan/17 05:28
9 kB
David Smiley

Issue Links

is related to

SOLR-9935 When hl.method=unified add support for hl.fragsize param

Resolved

Activity

People

Assignee:: David Smiley

Reporter:: David Smiley

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Jan/17 16:00

Updated:: 28/Aug/22 15:08

Resolved:: 08/Jan/17 04:19