Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.14
-
None
Description
I got an exception to extract text from pdf using Tika. I can't attach a file to reproduce due to confidentiality.
java.io.IOException: expected number, actual=COSFloat{18446744071795507394} at offset 9277 at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:166) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864) at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:904) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:873) at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793) at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
Attachments
Attachments
Issue Links
- duplicates
-
PDFBOX-4385 IOException "expected number, actual=COSFloat{18446744073430152624}" when loading PDF
- Closed
- is related to
-
TIKA-2842 Expected number, actual=COSFloat
- Open