Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
2.0.6
-
None
-
None
Description
I got an exception to extract text from PDF with Tika (exception thrown on pdfbox code):
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@2be78cf6 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) Caused by: java.io.IOException: Wrong type of referenced length object COSObject{11, 0}: COSNull at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:908) at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:950) at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:781) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:742) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673) at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1132) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1066) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:141) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 24 more