Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-875

Temporary file leak in ImageParser

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
      None

      Description

      The stream obtained through ImageIO.createImageInputStream is not closed (org.apache.tika.parser.image.ImageParser.parse(InputStream, ContentHandler, Metadata, ParseContext)). When performing a lot of parse operations, the process can easily run out of file descriptors.

      1. patch.svn.diff
        2 kB
        Niels Beekman

        Activity

        Hide
        n.beekman Niels Beekman added a comment -

        Stacktrace showing the acquisition trace during parsing of a ZIP-archive containing lots of images:

        java.io.RandomAccessFile.open(String, int)
        java.io.RandomAccessFile.<init>(File, String)
        javax.imageio.stream.FileCacheImageInputStream.<init>(InputStream, File)
        com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstance(Object, boolean, File)
        javax.imageio.ImageIO.createImageInputStream(Object)
        org.apache.tika.parser.image.ImageParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.AutoDetectParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.DelegatingParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream, ContentHandler, Metadata, boolean)
        org.apache.tika.parser.pkg.PackageExtractor.unpack(ArchiveInputStream, XHTMLContentHandler)
        org.apache.tika.parser.pkg.PackageExtractor.parse(InputStream)
        org.apache.tika.parser.pkg.PackageParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.AutoDetectParser.parse(InputStream, ContentHandler, Metadata, ParseContext)
        org.apache.tika.parser.ParsingReader$ParsingTask.run()
        java.lang.Thread.run()

        Show
        n.beekman Niels Beekman added a comment - Stacktrace showing the acquisition trace during parsing of a ZIP-archive containing lots of images: java.io.RandomAccessFile.open(String, int) java.io.RandomAccessFile.<init>(File, String) javax.imageio.stream.FileCacheImageInputStream.<init>(InputStream, File) com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstance(Object, boolean, File) javax.imageio.ImageIO.createImageInputStream(Object) org.apache.tika.parser.image.ImageParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.AutoDetectParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.DelegatingParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream, ContentHandler, Metadata, boolean) org.apache.tika.parser.pkg.PackageExtractor.unpack(ArchiveInputStream, XHTMLContentHandler) org.apache.tika.parser.pkg.PackageExtractor.parse(InputStream) org.apache.tika.parser.pkg.PackageParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.CompositeParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.AutoDetectParser.parse(InputStream, ContentHandler, Metadata, ParseContext) org.apache.tika.parser.ParsingReader$ParsingTask.run() java.lang.Thread.run()
        Hide
        n.beekman Niels Beekman added a comment -

        Attached is a patch that fixes this issue (diff against tags/1.0). Additionally, the reader is disposed in a try/finally construct.

        Show
        n.beekman Niels Beekman added a comment - Attached is a patch that fixes this issue (diff against tags/1.0). Additionally, the reader is disposed in a try/finally construct.
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Niels, patch looks good! I'll commit shortly...

        Show
        mikemccand Michael McCandless added a comment - Thanks Niels, patch looks good! I'll commit shortly...

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            n.beekman Niels Beekman
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development