Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.7
-
None
Description
I got an exception to extract TXT or HTML from PDF file:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@429568e8 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... Caused by: java.lang.IllegalArgumentException: Illegal Capacity: -1 at java.util.ArrayList.<init>(ArrayList.java:142) at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:72) at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:845) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:748) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673) at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1143) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1077) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:149) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 25 more