Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-818

PDFParser fails if object/xref starts at same line as endobj of a stream object

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.3.1
    • 1.6.0
    • Parsing
    • None

    Description

      If an object or xref starts at same line after the 'endobj' token and the closed object contains a stream, parsing of next object fails.
      Example:
      endstream
      endobj xref
      0 26
      In PDFParser if an object contains a stream the 'endobj' token is read via readLine(). Thus the line break is consumed as well. Now the 'endobj' with following command is handled but only 'xref' is pushed back and not the line break which results in 'xref0' when trying to read next pbject. Thus in this case a simple solution is to push back a space byte before the 'xref'.
      I will add a patch for it.
      Part of the problem can be seen in PDF from http://onlinelibrary.wiley.com/doi/10.1111/j.1399-6576.2009.02134.x/pdf at last 'endobj'. However the last object does not contain a stream and I was not able to produce such a PDF (the PDFs I have containing described problematic construct are unfortunately confidential).

      Attachments

        1. pdfbox_issue818.patch
          0.7 kB
          Timo Boehme

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tboehme Timo Boehme
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: