Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2447

PSDParser creates unnecessary large byte array and discards it

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.15, 1.16
    • Fix Version/s: 1.17
    • Component/s: parser
    • Labels:
      None
    • Environment:

      openjdk version "1.8.0_131"
      few memory (currently using 256M xmx)

      Description

      PSD files (Adobe Photoshop) are split into ResourceBlock's which contain different data, but only Caption Blocks are currently extracted into the description.
      Parsing a file with very big blocks, i.e. for image data, a byte array of the size of the block is allocated:
      https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191

      even if it is discarded after that:
      https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116 and following lines

      This causes huge memory consumption and finally killed the App with an OutOfMemoryError.

      java.lang.OutOfMemoryError: Java heap space
              at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191) ~[tika-parsers-1.15.jar!/:1.15]
              at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141) ~[tika-parsers-1.15.jar!/:1.15]
              at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) ~[tika-parsers-1.15.jar!/:1.15]
              at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15]
              at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15]
              at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) ~[tika-core-1.15.jar!/:1.15]
      

      I am not able to deliver a file to reproduce that, since the file which caused that issue is owned by one of our customers.
      I will prepare a pull request to fix that.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bjrke Jan Burkhardt
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: