Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3292

Error reading stream, expected='endstream' actual='' in non-truncated files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.1, 3.0.0 PDFBox
    • Component/s: Parsing
    • Labels:
      None

      Description

      When PDF files are truncated, one of the most common exceptions in PDFBox 2.0.0 is:

      java.io.IOException: Error reading stream, expected='endstream' actual='' at offset 165888
      	at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:999)
      	at org.apache.pdfbox.pdfparser.COSParser.parseXrefObjStream(COSParser.java:326)
      	at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:287)
      	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192)
      	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
      

      There are two files in govdocs1 that are NOT truncated and trigger this exception in 2.0.0, but were parsed by PDFBox 1.8.11 with the classic parser.

        Attachments

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              tallison Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: