Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4367

Error expected floating point number actual='18-5'

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.0.12
    • 2.0.13, 3.0.0 PDFBox
    • Text extraction
    • None
    • Mac OS X Sierra

    Description

      Able to repeat with command line.  Unfortunately, the only files that repeat this are from a customer, and contain sensitive information.  The file opens without error in Acrobat Reader and Mac Preview.  The desired result is that any corrupt portions of the PDF are skipped, so that we can use what text is extractable.

      Unfortunately, I still get an error when using the -force option.

      We get the following stack trace:

      C02V390UHTD6:Downloads pjohnson$ java -jar pdfbox-app-2.0.12.jar ExtractText 16cccd9af5032a303774f7b87fb95076.pdf
      Nov 02, 2018 10:04:54 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSArray
      WARNING: Corrupt object reference at offset 19727
      Exception in thread "main" java.io.IOException: Error expected floating point number actual='18-5'
      at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:78)
      at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:110)
      at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:947)
      at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:631)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:174)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:510)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
      at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
      at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
      at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
      at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
      at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237)
      at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
      at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
      Caused by: java.lang.NumberFormatException
      at java.math.BigDecimal.<init>(BigDecimal.java:494)
      at java.math.BigDecimal.<init>(BigDecimal.java:383)
      at java.math.BigDecimal.<init>(BigDecimal.java:806)
      at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:59)
      ... 14 more
      

      Attachments

        Activity

          People

            tilman Tilman Hausherr
            pjohnson@proofpoint.com Peter Johnson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: