Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8365

ArrayIndexOutOfBoundsException in UnifiedHighlighter

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 7.3.1
    • Fix Version/s: 7.4.1, 7.5, 8.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      We see ArrayIndexOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time:

      java.lang.ArrayIndexOutOfBoundsException
      	at java.base/java.lang.System.arraycopy(Native Method)
      	at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386)
      	at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341)
      	at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121)
      	at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149)
      	at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171)
      	at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120)
      	at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261)
      ...
      

      It turns out that there is an "off by one" error in the UnifiedHighlighter's code that, as far as I can tell, is only triggered when two nested SpanNearQueries contain the same term.

      The resulting behaviour depends on the content of the highlighted document. Either, some highlighted terms go missing or an ArrayIndexOutOfBoundsException is thrown.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                simonw Simon Willnauer
                Reporter:
                marc.morissette Marc Morissette
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m