Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3961

When a parser exception happens, the "resourceName" key becomes "esourceName"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 2.4.1
    • None
    • core
    • None
    • Windows 10.   Tika 2.4.1.  Tika server.   

    Description

      Test env: Windows 10
      Tika 2.4.1, tika server

       

      In my config I've specified:
          <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
            <params>
              <include>
                <field>X-TIKA:content</field>
                <field>dc:creator</field>
                <field>dc:title</field>
                <field>resourceName</field>
                <field>X-TIKA:EXCEPTION:container_exception</field>
              </include>
            </params>
          </metadataFilter>
       

      For a password-protected docx file Tika returns the following (see bold txt at the bottom):
      [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException: java.security.NoSuchAlgorithmException: Cannot find any provider supporting AES/CBC/NoPadding\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080] java:274)\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\r\n\tat org.eclipse.jetty.servr.HttpChannel.lambda$handle$1(HttpChannel.java:487)\r\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFilable(HttpConnection.java:277)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\r\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(hannelEndPoint.java:104)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\r\n\tat org.eclipse.jetty.util.thread.trategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\r\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.jaa:409)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\r\n\tat java.lang.Thread.run(Thread.java:827)\r\nCaused by: java.ecurity.NoSuchAlgorithmException: Cannot find any provider supporting AES/CBC/NoPadding\r\n\tat javax.crypto.Cipher.getInstance(Cipher.java:543)\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:258)\r\n\t... 51 more\r\n","esourceName":"encrypted.docx"}]

       

      If I disable return of the exception meta, then resourceName is returned correctly:
      [8D84:0002-60C4] 01/26/2023 05:45:58 PM DEBUG_TIKA write_callback - ptr = t:
      [\{"resourceName":"encrypted.docx"}]

       

      Believe this is reproducible with any password-protected docx file.

      Attachments

        1. encrypted.docx
          19 kB
          Josh Burchard

        Activity

          People

            Unassigned Unassigned
            jmbox80 Josh Burchard
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: