Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1151

StreamCorruptedException on bad PDF with -force

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 1.6.0, 1.8.7, 2.0.0
    • None
    • Parsing
    • None
    • Windows Vista
      Sun JDK 1.6.0_26

    Description

      I am getting the StreamCorruptedException when trying to parse a possibly invalid PDF document even if the -force option is specified.

      Stack trace:

      java.io.StreamCorruptedException: Error: data is null
      at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:82)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
      at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
      at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:105)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:264)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
      at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
      at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
      at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
      at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
      at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:256)
      at org.apache.pdfbox.ExtractText.main(ExtractText.java:76)
      at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)

      My suggestion is to skip bad sub-streams without throwing exceptions in PDFStreamEngine.processSubStream() in case of forceParsing is true.

      Attachments

        1. PDFStreamEngine.patch
          0.7 kB
          Stas Shaposhnikov
        2. test.pdf
          122 kB
          Stas Shaposhnikov

        Activity

          People

            lehmi Andreas Lehmkühler
            stas.shaposhnikov Stas Shaposhnikov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: