Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1404

Random failures with highlighting

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      With a recent Solr nightly, we started getting errors when highlighting.

      I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.

      The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.

      SCHEMA

      <?xml version="1.0" encoding="UTF-8" ?>

      <schema name="example" version="1.2">

      <types>
      <fieldType name="string" class="solr.StrField" />

      <fieldtype name="testtype" class="solr.TextField">
      <analyzer>
      <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
      </analyzer>
      </fieldtype>
      </types>

      <fields>
      <field name="id" type="string" indexed="true" stored="false" />
      <field name="test" type="testtype" indexed="false" stored="true" />
      </fields>

      <uniqueKey>id</uniqueKey>

      </schema>

      INDEX

      URL=http://localhost:8983/solr/update

      curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
      curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'

      QUERY

      curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'

      ERROR

      org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4

      org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
      at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
      at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
      at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
      at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
      at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
      at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
      at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
      at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
      at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
      at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
      at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
      at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
      at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
      at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
      at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
      at org.mortbay.jetty.Server.handle(Server.java:285)
      at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
      at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
      at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
      at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
      at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
      at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
      at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
      Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
      at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
      at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
      ... 23 more

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            andersm Anders Melchiorsen
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment