Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
2.0.20, 2.0.21
-
None
-
None
-
Debian, MacOs, open JDK 12
Description
I got an I/O Exception when I try to open some PDF using the lib (calling PDDocument.load(pdfFile)). Here are some urls with affected PDF (I think it's the same problem for all of them) :
- https://www.buerger.uni-frankfurt.de/80977779/Rehbein_Schule_Hanau_9_2018.pdf
- http://www.geislerfarms.com/documents/filelibrary/Geisler_COVID_statement_0A7A094E1EFB7.pdf
- http://www.sahealth.sa.gov.au/wps/wcm/connect/c736e1d5-932e-4f8a-8e56-52ab10a214fd/SALHN+Governing+Board+Minutes+-+5+March+2020.pdf?MOD=AJPERES&CACHEID=ROOTWORKSPACE-c736e1d5-932e-4f8a-8e56-52ab10a214fd-niR9I3J
I think the files are not well formatted and doesn't respect PDF specs but I can open them using other pdf viewer (like chrome pdf viewer for example)
Here is the stack trace :
java.io.IOException: Error: End-of-File, expected linejava.io.IOException: Error: End-of-File, expected line at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098) at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2581) at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1041) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:989)