[LUCENE-6445] Highlighter TokenSources simplification; just one getAnyTokenStream() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.2
Component/s: modules/highlighter
Labels:
None

Lucene Fields:

New

Description

The Highlighter "TokenSources" class has quite a few utility methods pertaining to getting a TokenStream from either term vectors or analyzed text. I think it's too much:

some go to term vectors, some don't. But if you don't want to go to term vectors, then it's quite easy for the caller to invoke the Analyzer for the field value, and to get that field value.
Some methods return null, some never null; I forget which at a glance.
Some methods read the Document (to get a field value) from the IndexReader, some don't. Furthermore, it's not an ideal place to get the doc since your app might be using an IndexSearcher with a document cache (e.g. SolrIndexSearcher).
None of the methods accept a Fields instance from term vectors as a parameter. Based on how Lucene's term vector format works, this is a performance trap if you don't re-use an instance across fields on the document that you're highlighting.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6445_TokenSources_simplification.patch
24/Apr/15 20:34
48 kB
David Smiley

Issue Links

is related to

LUCENE-6423 New LimitTokenOffsetFilter

Closed

SOLR-5855 re-use document term-vector Fields instance across fields in the DefaultSolrHighlighter

Closed

relates to

LUCENE-6392 Add offset limit to Highlighter's TokenStreamFromTermVector

Closed

Activity

People

Assignee:: David Smiley

Reporter:: David Smiley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Apr/15 18:38

Updated:: 28/Aug/22 14:31

Resolved:: 28/Apr/15 16:14