Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8848

UnifiedHighlighter should highlight all Query types that implement Weight.matches

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.2
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The UnifiedHighlighter internally extracts terms and automata from the query. Usually this works perfectly but it's possible a Query might be of a type it doesn't know – a leaf query that is perhaps in effect similar to a MultiTermQuery yet it might not even be a subclass of this or it does but the UH doesn't know how to extract an automata from it. The UH is oblivious to this and probably won't highlight this query. If re-analysis of the text is necessary, the UH will pre-filter all terms to only those it thinks are pertinent. Or if offsets are in the postings then the UH could perform very poorly by unleashing this query on the index for each highlighted document without recognizing re-analysis is a more appropriate path.

      I think to solve this, the UnifiedHighlighter.getFieldHighlighter needs to inspect the query (using a QueryVisitor) to see if it can find a leaf query that is not one it knows how to pull automata from, and is otherwise not in a special list (like MatchAllDocsQuery). If we find one, we avoid choosing OffsetSource.POSTINGS or OffsetSource.NONE_NEEDED since we might in effect have an MTQ like query. If a MemoryIndex is needed then we don't pre-filter the terms since we can't assume we know precisely which terms are pertinent.

      We needn't bother extracting terms & automata in this case either; it's wasted effort which can involve building a CharacterRunAutomaton (see MultiTermHighlighting.binaryToCharRunAutomaton). Speaking of which, it'd be nice to avoid that in other cases as well, like for WEIGHT_MATCHES when we aren't using MemoryIndex (thus no term pre-filtering).

        Attachments

        1. LUCENE-8848.patch
          25 kB
          David Wayne Smiley

          Issue Links

            Activity

              People

              • Assignee:
                dsmiley David Wayne Smiley
                Reporter:
                dsmiley David Wayne Smiley
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: