Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3887

Getting a "DataFormatException: invalid distance too far back" exception for the attached file

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.7
    • Fix Version/s: 2.0.8, 3.0.0 PDFBox
    • Component/s: Text extraction
    • Environment:
      Windows 10 64-bit, Ubuntu 14.04 64-bit.

      java version "1.8.0_141"
      Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

      Description

      PdfBox throws the following exception:

      Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid distance too far back
      	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:82)
      	at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
      	at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
      	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
      	at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:847)
      	at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:753)
      	at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:678)
      	at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:638)
      	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:236)
      	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:940)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:888)
      	at com.diligen.parser.pdf.PdfBoxHelper.getDocumentWithLineSegments(PdfBoxHelper.java:131)
      	... 7 more
      Caused by: java.util.zip.DataFormatException: invalid distance too far back
      	at java.util.zip.Inflater.inflateBytes(Native Method)
      	at java.util.zip.Inflater.inflate(Inflater.java:259)
      	at java.util.zip.Inflater.inflate(Inflater.java:280)
      	at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:107)
      	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:73)
      	... 20 more
      

      If there is no quick solution for this bug, is there a workaround? Can I somehow catch the exception and take some action?

        Attachments

        1. non-contract_00025.pdf
          285 kB
          Harun Reşit Zafer

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              hrzafer Harun Reşit Zafer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: