Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Not A Bug
-
2.0.25
-
None
-
None
Description
Hi,
I have a PDF file that throws the following error when I try to parse it:
Caused by: java.io.IOException: Page tree root must be a dictionary at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1202) at org.apache.tika.parser.pdf.PDFParser.getPDDocument(PDFParser.java:191) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:149) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289) ... 5 more
I have attached the file in question with this issue.
Might be related to PDFBOX-4915