[LUCENE-2939] Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.2, 4.0-ALPHA
Component/s: modules/highlighter
Labels:
None

Description

huge documents can be drastically slower than need be because the entire field is added to the memory index
this cost can be greatly reduced in many cases if we try and respect maxDocCharsToAnalyze

things can be improved even further by respecting this setting with CachingTokenStream

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2939.patch
05/Mar/11 07:36
11 kB
Mark Miller
LUCENE-2939.patch
04/Mar/11 02:19
10 kB
Mark Miller
LUCENE-2939.patch
27/Feb/11 17:42
9 kB
Mark Miller
LUCENE-2939.patch
27/Feb/11 01:18
7 kB
Mark Miller

Issue Links

is part of

SOLR-2390 Performance of usePhraseHighlighter is terrible on very large Documents, regardless of hl.maxDocCharsToAnalyze

Closed

Activity

People

Assignee:: Mark Miller

Reporter:: Mark Miller

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Feb/11 01:14

Updated:: 28/Aug/22 12:41

Resolved:: 20/Apr/11 14:44