Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
in DefaultSolrHighlighter, the hl.maxAnalyzedChars figure is used to constrain how much text is analyzed before the highlighter stops, in the interests of performance. For a multi-valued field, it effectively treats each value anew, no matter how much text it was previously analyzed for other values for the same field for the current document. The PostingsHighlighter doesn't work this way – hl.maxAnalyzedChars is effectively the total budget for a field for a document, no matter how many values there might be. It's not reset for each value. I think this makes more sense. When we loop over the values, we should subtract from hl.maxAnalyzedChars the length of the value just checked. The motivation here is consistency with PostingsHighlighter, and to allow for hl.maxAnalyzedChars to be pushed down to term vector uninversion, which wouldn't be possible for multi-valued fields based on the current way this parameter is used.
Interestingly, I noticed Solr's use of FastVectorHighlighter doesn't honor hl.maxAnalyzedChars as the FVH doesn't have a knob for that. It does have hl.phraseLimit which is a limit that could be used for a similar purpose, albeit applied differently.
Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit early from it's field value loop if it reaches hl.snippets, and if hl.preserveMulti=true
Attachments
Attachments
Issue Links
- contains
-
SOLR-7488 suspicious FVH init code in DefaultSolrHighlighter even when FVH should not be used
- Closed
- is duplicated by
-
SOLR-7326 Reduce hl.maxAnalyzedChars budget for multi-valued fields in the default highlighter
- Closed
- is related to
-
SOLR-4656 Add hl.maxMultiValuedToExamine to limit the number of multiValued entries examined while highlighting
- Closed
- relates to
-
SOLR-7327 DefaultSolrHighlighter should lazily create a FVH FieldQuery.
- Closed