Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2497

Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.16
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      Getting this exception when parsing certain pptx files. Example included.

      <response>
      <lst name="responseHeader"><int name="status">500</int><int name="QTime">204</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">java.lang.IllegalStateException</str></lst><str name="msg">org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3225ac62</str><str name="trace">org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3225ac62
      at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
      at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
      at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
      at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
      at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
      at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
      at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
      at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
      at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
      at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
      at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
      at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
      at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
      at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
      at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
      at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
      at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
      at org.eclipse.jetty.server.Server.handle(Server.java:534)
      at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
      at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
      at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
      at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
      at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
      at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
      at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
      at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
      at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
      at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
      at java.lang.Thread.run(Unknown Source)
      Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3225ac62
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
      at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
      ... 34 more
      Caused by: java.lang.IllegalStateException: Schemas (*.xsb) for CTTable can't be loaded - usually this happens when OSGI loading is used and the thread context classloader has no reference to the xmlbeans classes - use POIXMLTypeLoader.setClassLoader() to set the loader, e.g. with CTTable.class.getClassLoader()
      at org.apache.poi.xslf.usermodel.XSLFTable.<init>(XSLFTable.java:76)
      at org.apache.poi.xslf.usermodel.XSLFGraphicFrame.create(XSLFGraphicFrame.java:90)
      at org.apache.poi.xslf.usermodel.XSLFSheet.buildShapes(XSLFSheet.java:112)
      at org.apache.poi.xslf.usermodel.XSLFSheet.initDrawingAndShapes(XSLFSheet.java:173)
      at org.apache.poi.xslf.usermodel.XSLFSheet.getShapes(XSLFSheet.java:157)
      at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:110)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:139)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:142)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      ... 37 more
      </str><int name="code">500</int></lst>
      </response>

      1. BugExample.pptx
        41 kB
        Advokat
      2. fileupload_passt_configset.zip
        5 kB
        Advokat
      3. solr.log
        20 kB
        Advokat

        Activity

        Hide
        advokat Advokat added a comment -

        Just tried it again with same result. This time we used a solr 7.1 with a simple core to use the Fileimport feature. I have attached the solr.log and the configset that was used to reproduce this error. If this is not Tika related should i recreate this report in the Solr project?

        Show
        advokat Advokat added a comment - Just tried it again with same result. This time we used a solr 7.1 with a simple core to use the Fileimport feature. I have attached the solr.log and the configset that was used to reproduce this error. If this is not Tika related should i recreate this report in the Solr project?
        Hide
        advokat Advokat added a comment -

        We are using Solr Version 6.6.2 and as far as i know we are not doing anything custom wit Tika and its dependencies/jars. We are sending the File as Stream to Solr instead of simple path but i am not sure if that makes a difference in this case.

        Show
        advokat Advokat added a comment - We are using Solr Version 6.6.2 and as far as i know we are not doing anything custom wit Tika and its dependencies/jars. We are sending the File as Stream to Solr instead of simple path but i am not sure if that makes a difference in this case.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Andreas Beeker, any ideas what may be causing this in Solr?

        Show
        tallison@mitre.org Tim Allison added a comment - Andreas Beeker , any ideas what may be causing this in Solr?
        Hide
        tallison@mitre.org Tim Allison added a comment -

        It looks like pure Tika master is able to handle this. Which version of Solr are you using? Are you doing anything custom with Tika and its dependencies/jars? Did you get the same exception before SOLR-10335?

        Show
        tallison@mitre.org Tim Allison added a comment - It looks like pure Tika master is able to handle this. Which version of Solr are you using? Are you doing anything custom with Tika and its dependencies/jars? Did you get the same exception before SOLR-10335 ?

          People

          • Assignee:
            Unassigned
            Reporter:
            advokat Advokat
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development