Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2447

PSDParser creates unnecessary large byte array and discards it

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.15, 1.16
    • 1.17
    • parser
    • None
    • openjdk version "1.8.0_131"
      few memory (currently using 256M xmx)

    Description

      PSD files (Adobe Photoshop) are split into ResourceBlock's which contain different data, but only Caption Blocks are currently extracted into the description.
      Parsing a file with very big blocks, i.e. for image data, a byte array of the size of the block is allocated:
      https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191

      even if it is discarded after that:
      https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116 and following lines

      This causes huge memory consumption and finally killed the App with an OutOfMemoryError.

      java.lang.OutOfMemoryError: Java heap space
              at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191) ~[tika-parsers-1.15.jar!/:1.15]
              at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141) ~[tika-parsers-1.15.jar!/:1.15]
              at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) ~[tika-parsers-1.15.jar!/:1.15]
              at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15]
              at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15]
              at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) ~[tika-core-1.15.jar!/:1.15]
      

      I am not able to deliver a file to reproduce that, since the file which caused that issue is owned by one of our customers.
      I will prepare a pull request to fix that.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bjrke Jan Burkhardt
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: