Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
2.0.6
-
None
-
None
Description
I got a java.io.IOException on PDFBox code throught Tika HTML extraction:
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@4082e5e4 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 16 more Caused by: java.io.IOException: Missing root object specification in trailer. at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2225) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:227) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1132) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1066) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:141) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 24 more