Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3887

Getting a "DataFormatException: invalid distance too far back" exception for the attached file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.7
    • 2.0.8, 3.0.0 PDFBox
    • Text extraction
    • Windows 10 64-bit, Ubuntu 14.04 64-bit.

      java version "1.8.0_141"
      Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

    Description

      PdfBox throws the following exception:

      Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid distance too far back
      	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82)
      	at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
      	at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
      	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
      	at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847)
      	at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753)
      	at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678)
      	at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638)
      	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236)
      	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888)
      	at com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131)
      	... 7 more
      Caused by: java.util.zip.DataFormatException: invalid distance too far back
      	at java.util.zip.Inflater.inflateBytes(Native Method)
      	at java.util.zip.Inflater.inflate(Inflater.java:259)
      	at java.util.zip.Inflater.inflate(Inflater.java:280)
      	at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107)
      	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73)
      	... 20 more
      

      If there is no quick solution for this bug, is there a workaround? Can I somehow catch the exception and take some action?

      Attachments

        1. non-contract_00025.pdf
          285 kB
          Harun Reşit Zafer

        Activity

          People

            tilman Tilman Hausherr
            hrzafer Harun Reşit Zafer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: