Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-803

Improved handling erronous data between endstream and endobj lines

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.1
    • None
    • None

    Description

      I found that a PDF created by Exstream Dialogue Version 5.0.039 had ">> " between the endstream and endobj sections. When this happened, PDFBox threw an exception. This patch ignores junk characters between these sections so the files can be processed. A log message is written warning the user of the violation of the spec. For reference, here's the object I found in the file (excluding the stream data):
      27 0 obj
      <<
      /Filter [/A85 /Fl]
      /Length 322
      >>
      stream
      (data from stream omitted)
      endstream
      >> endobj
      %PDF Font (F315)

      As a side note Exstream seems to have sold their Dialogue software to HP, and the current version is 7. This means the bug is likely fixed in the latest version, but there are still some older PDFs out there which PDFBox should be able to handle without throwing an exception.

      Attachments

        Activity

          People

            adamnichols Adam Nichols
            adamnichols Adam Nichols
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: