Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2891

ForkClient "fillBootstrapJar()" lack few "MANIFEST.MF" properties

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Due to "OOM: heap space" caused by big ".doc" files, we have decided to move to a "ForkParser" in order to get these errors, log them and keep processing the next documents.

      Unfortunately, whenever we have an image in a document, we get the following error:

      Unexpected error in forked server process
      org.apache.tika.exception.TikaException: Unexpected error in forked server process
      ... (bunch of line to tell call to "ForkParser.parse" failed)
      Cause: java.util.ServiceConfigurationError: javax.imageio.spi.ImageOutputStreamSpi: Provider com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi could not be instantiated
       at java.util.ServiceLoader.fail(ServiceLoader.java:232)
       at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
       at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
       at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
       at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
       at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
       at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
       at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
       at javax.imageio.ImageIO.<clinit>(ImageIO.java:66)
       at org.apache.pdfbox.tools.imageio.ImageIOUtil.writeImage(ImageIOUtil.java:174)
       ...
       Cause: java.lang.ExceptionInInitializerError:
       at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
       at java.lang.Class.newInstance(Class.java:442)
       at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
       at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
       at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
       at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
       ...
       Cause: java.lang.NullPointerException:
       at com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)
       at com.github.jaiimageio.impl.stream.ChannelImageOutputStreamSpi.<init>(ChannelImageOutputStreamSpi.java:66)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
       at java.lang.Class.newInstance(Class.java:442)
       at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
       at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
       at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
       ...
      

      This kind of errors didn't appear before, when we were only using an "AutodetectParser". My research of a solution lead me to "ForkClient" where you can see that only the "Main-Class" is defined in "META-INF/MANIFEST.MF",  whereas in "com.github.jaiimageio.impl.common.PackageUtil.<clinit>(PackageUtil.java:91)" they check that the "Implementation-Vendor" and "Implementation-Version" are not null.

      It's quite easy to reproduce:

      1. download a simple file example with this link
      2. use this piece of code:
      def test = {
        val forkParser = new ForkParser(ExtractText.getClass.getClassLoader, new AutoDetectParser())
      
        val output = new BodyContentHandler()
        val stream = TikaInputStream.get(new FileInputStream("/path/to/file-sample_100kB.odt"))
        val ctx = new ParseContext()
      
        forkParser.parse(stream, output, new Metadata(), ctx)
      }

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Gangstere44 Quentin Laville

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment