Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7719

UnifiedHighlighter doesn't handle some AutomatonQuery's with multi-byte chars

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In MultiTermHighlighting, a CharacterRunAutomaton is being created that takes the result of AutomatonQuery.getAutomaton that in turn is byte oriented, not character oriented. For ASCII terms, this is safe but it's not for multi-byte characters. This is most likely going to rear it's head with a WildcardQuery, but due to special casing in MultiTermHighlighting, PrefixQuery isn't affected. Nonetheless it'd be nice to get a general fix in so that MultiTermHighlighting can remove special cases for PrefixQuery and TermRangeQuery (both subclass AutomatonQuery).

      AFAICT, this bug was likely in the PostingsHighlighter since inception.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dsmiley David Smiley
                Reporter:
                dsmiley David Smiley
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: