Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-662

Prevent creating of ZipInputStreamZipEntrySource when reading files from disk

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.10
    • None
    • None

    Description

      POI provides two ways to open OPCPackage - via InputStream and via File. Creating OPCPackage from InputStream casuses creation of ZipInputStreamZipEntrySource, that buffers all uncompressed data in memory. This takes a lot of memory and it is not needed when we are reading files from disk or when we already copied stream into temporary file.

      This patch removes usage of ZipInputStreamZipEntrySource in this case.

      Unfortunately, it breaks ZIP-bomb prevention for OOXML parser (and other parsers that uses TikaInputStream.getFile()). I think that ZIP-bomb prevention should be additionally implemented for that formats before committing this to SVN.

      Attachments

        1. TIKA-662.patch
          4 kB
          Maxim Valyanskiy

        Activity

          People

            Unassigned Unassigned
            maxim.valyanskiy Maxim Valyanskiy
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: