[LUCENE-7438] UnifiedHighlighter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 6.2
Fix Version/s: 6.3
Component/s: modules/highlighter
Labels:
None

Lucene Fields:

New, Patch Available

Description

The UnifiedHighlighter is an evolution of the PostingsHighlighter that is able to highlight using offsets in either postings, term vectors, or from analysis (a TokenStream). Lucene’s existing highlighters are mostly demarcated along offset source lines, whereas here it is unified – hence this proposed name. In this highlighter, the offset source strategy is separated from the core highlighting functionalty. The UnifiedHighlighter further improves on the PostingsHighlighter’s design by supporting accurate phrase highlighting using an approach similar to the standard highlighter’s WeightedSpanTermExtractor. The next major improvement is a hybrid offset source strategythat utilizes postings and “light” term vectors (i.e. just the terms) for highlighting multi-term queries (wildcards) without resorting to analysis. Phrase highlighting and wildcard highlighting can both be disabled if you’d rather highlight a little faster albeit not as accurately reflecting the query.
We’ve benchmarked an earlier version of this highlighter comparing it to the other highlighters and the results were exciting! It’s tempting to share those results but it’s definitely due for another benchmark, so we’ll work on that. Performance was the main motivator for creating the UnifiedHighlighter, as the standard Highlighter (the only one meeting Bloomberg Law’s accuracy requirements) wasn’t fast enough, even with term vectors along with several improvements we contributed back, and even after we forked it to highlight in multiple threads.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7438.patch
01/Oct/16 02:22
393 kB
David Smiley
LUCENE_7438_UH_small_changes.patch
01/Oct/16 16:25
9 kB
David Smiley
LUCENE_7438_UH_benchmark.patch
20/Sep/16 04:31
28 kB
David Smiley
LUCENE_7438_UH_benchmark.patch
05/Oct/16 12:39
58 kB
David Smiley

Issue Links

contains

LUCENE-4825 PostingsHighlighter support for positional queries

Closed

is related to

SOLR-9708 Expose UnifiedHighlighter in Solr

Resolved

links to

GitHub Pull Request #79

Activity

People

Assignee:: David Smiley

Reporter:: Timothy M. Rodriguez

Votes:: 8 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 07/Sep/16 14:54

Updated:: 28/Aug/22 15:02

Resolved:: 07/Oct/16 13:59