Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
In the latest regression run with PDFBox's 2.x branch, we're now getting very slow processing on a truncated PDF with PDFBox app's ExtractText:
Turns out this is not an infinite loop. After 4.5 minutes, ExtractText eventually ended with:
Exception in thread "main" java.io.IOException: Missing root object specification in trailer. at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2508) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:193) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1012) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:950) at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192) at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
.