Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3292

Error reading stream, expected='endstream' actual='' in non-truncated files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 3.0.0 PDFBox
    • Parsing
    • None

    Description

      When PDF files are truncated, one of the most common exceptions in PDFBox 2.0.0 is:

      java.io.IOException: Error reading stream, expected='endstream' actual='' at offset 165888
      	at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:999)
      	at org.apache.pdfbox.pdfparser.COSParser.parseXrefObjStream(COSParser.java:326)
      	at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:287)
      	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192)
      	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
      	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
      

      There are two files in govdocs1 that are NOT truncated and trigger this exception in 2.0.0, but were parsed by PDFBox 1.8.11 with the classic parser.

      Attachments

        Activity

          People

            lehmi Andreas Lehmkühler
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: