Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
2.0.0
-
None
Description
When PDF files are truncated, one of the most common exceptions in PDFBox 2.0.0 is:
java.io.IOException: Error reading stream, expected='endstream' actual='' at offset 165888 at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:999) at org.apache.pdfbox.pdfparser.COSParser.parseXrefObjStream(COSParser.java:326) at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:287) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
There are two files in govdocs1 that are NOT truncated and trigger this exception in 2.0.0, but were parsed by PDFBox 1.8.11 with the classic parser.