Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2976

java.util.zip.DataFormatException: incorrect data check

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 1.8.11, 2.0.0
    • Component/s: Parsing
    • Labels:
      None
    • Environment:
      Linux Mint 17.2 x64, JDK7u79, Glassfish 3.1.2.2

      Description

      When trying to open certain PDF files (examples attached, also any MSDS available at http://www.scbt.com/datasheet-356376.html ), an expection is thrown resulting in the file not being parsed:
      java.io.IOException: java.util.zip.DataFormatException: incorrect data check
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
      at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:78)
      at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:160)
      at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:143)
      at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:148)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:92)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:450)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
      at org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117)
      at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367)
      at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303)
      at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248)
      at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209)

      – or –

      java.io.IOException: java.util.zip.DataFormatException: incorrect data check
      at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
      at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:78)
      at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:160)
      at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:143)
      at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:148)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:92)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:450)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
      at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
      at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:179)
      at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
      at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)

        Attachments

        1. sc-356376(1).pdf
          60 kB
          Felix Rudolphi
        2. sc-356376.pdf
          56 kB
          Felix Rudolphi
        3. sc-356376(1)-x.pdf
          60 kB
          Felix Rudolphi
        4. sc-356376-x.pdf
          55 kB
          Felix Rudolphi
        5. PDFBOX2976_FlateFilter2.patch
          2 kB
          Andreas Lehmkühler
        6. 500 ml (500.0) - Bisomer® MPEG350MA - 26915-72-0 - IVW_ 444 Oberflächentechnik.pdf
          88 kB
          Felix Rudolphi

          Issue Links

            Activity

              People

              • Assignee:
                lehmi Andreas Lehmkühler
                Reporter:
                chemFelix Felix Rudolphi
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 3h
                  3h
                  Remaining:
                  Remaining Estimate - 3h
                  3h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified