Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
2.0.9
-
None
Description
On TIKA-2619, fd shared a document that triggers an OOM in 2.0.8, and I just confirmed in pure PDFBox app's ExtractText with 2.0.9. The triggering document is attached to TIKA-2619.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.util.ArrayList.grow(Unknown Source) at java.util.ArrayList.ensureExplicitCapacity(Unknown Source) at java.util.ArrayList.ensureCapacityInternal(Unknown Source) at java.util.ArrayList.addAll(Unknown Source) at org.apache.pdfbox.cos.COSArray.addAll(COSArray.java:124) at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getLineDashPattern(PDExtendedGraphicsState.java:280) at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:89) at org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:503) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237) at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
Attachments
Issue Links
- causes
-
TIKA-2619 Memory leak: PDF meta data detection fails with OutOfMemoryError
- Open