Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
2.0.12
-
None
-
Mac OS X Sierra
Description
Able to repeat with command line. Unfortunately, the only files that repeat this are from a customer, and contain sensitive information. The file opens without error in Acrobat Reader and Mac Preview. The desired result is that any corrupt portions of the PDF are skipped, so that we can use what text is extractable.
Unfortunately, I still get an error when using the -force option.
We get the following stack trace:
C02V390UHTD6:Downloads pjohnson$ java -jar pdfbox-app-2.0.12.jar ExtractText 16cccd9af5032a303774f7b87fb95076.pdf Nov 02, 2018 10:04:54 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSArray WARNING: Corrupt object reference at offset 19727 Exception in thread "main" java.io.IOException: Error expected floating point number actual='18-5' at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:78) at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:110) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:947) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:631) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:174) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:510) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237) at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) Caused by: java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:494) at java.math.BigDecimal.<init>(BigDecimal.java:383) at java.math.BigDecimal.<init>(BigDecimal.java:806) at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:59) ... 14 more