Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16020

StringIndexOutOfBoundsException in BaseFragmentsBuilder when using the Highlighter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • highlighter
    • None

    Description

      Our production monitoring indicates sporadic cases (every few days) where a StringIndexOutOfBoundsException is thrown inside the highlighter, leading to HTTP 500 responses from Solr down the road.

      To be honest, this is a Solr 7.7.3 node, but maybe someone can help me investigate anyways.

      Here's the stack trace:

      java.lang.StringIndexOutOfBoundsException: begin 66, end 43, length 201
      at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
      at java.base/java.lang.String.substring(Unknown Source)
      at org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.makeFragment(BaseFragmentsBuilder.java:180)
      at org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.createFragments(BaseFragmentsBuilder.java:144)
      at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:186)
      at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:520)
      at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingOfField(DefaultSolrHighlighter.java:478)
      at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:442)
      at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:183)
      at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
      [...]

      I have found SOLR-4137 which looks pretty much the same, but that has been fixed in 4.3 already. So I thought it might be worth filing this as a separate issue.

      My (very limited) understanding from SOLR-4137 is that this bug surfaces in the highlighter, but that you'd rather not catch (and mask) it there but instead find the root cause. The issue might come from bogus components in the analyzer chain.

      So please advise how we could proceed here and what information I'd need to provide.

      In particular, if I should run the query (which was "hepatitis+screening", including the quotes, by the way) or a document through field analysis, how can I figure out which document caused the problem?

      The result returned by Solr pretty much looked like a regular result would do, but it had a 500 status code and the aforementioned exception in the "error" JSON subfield.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            mpdude Matthias Pigulla
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: