Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
2.0.7
-
None
-
None
Description
I got the exception to extract HTML from PDF file:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@7ca231e4
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
...
Caused by: java.lang.IllegalArgumentException: root cannot be null
at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75)
at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129)
at org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1398)
at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:243)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 25 more