Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2529

ArrayIndexOutOfBoundsException when processing certain .doc files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.16
    • None
    • None
    • None
    • We are using Solr Version 7.1.0

    Description

      When Solr (7.1.0) is trying to parse this .doc file we get following exception:
      Seems to be related to an older form of .doc files because converting the .doc to a .docx and then back to a .doc fixes this issue.

      {
      "responseHeader":

      { "status":500, "QTime":265}

      ,
      "error":

      { "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.ArrayIndexOutOfBoundsException"], "msg":"org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83", "trace":"org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)\r\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)\r\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\r\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat java.lang.Thread.run(Unknown Source)\r\nCaused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)\r\n\t... 34 more\r\nCaused by: java.lang.ArrayIndexOutOfBoundsException: -1\r\n\tat org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:329)\r\n\tat org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:74)\r\n\tat org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:100)\r\n\tat org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:727)\r\n\tat org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:227)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:712)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:702)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:174)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\t... 38 more\r\n", "code":500}

      }

      Attachments

        1. ProblemFormatIstAlt.doc
          28 kB
          Advokat

        Activity

          People

            Unassigned Unassigned
            advokat Advokat
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: