Details
Description
PSD files (Adobe Photoshop) are split into ResourceBlock's which contain different data, but only Caption Blocks are currently extracted into the description.
Parsing a file with very big blocks, i.e. for image data, a byte array of the size of the block is allocated:
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
even if it is discarded after that:
https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116 and following lines
This causes huge memory consumption and finally killed the App with an OutOfMemoryError.
java.lang.OutOfMemoryError: Java heap space at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191) ~[tika-parsers-1.15.jar!/:1.15] at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141) ~[tika-parsers-1.15.jar!/:1.15] at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) ~[tika-parsers-1.15.jar!/:1.15] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) ~[tika-core-1.15.jar!/:1.15]
I am not able to deliver a file to reproduce that, since the file which caused that issue is owned by one of our customers.
I will prepare a pull request to fix that.