Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-818

PDFParser fails if object/xref starts at same line as endobj of a stream object

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.3.1
    • Fix Version/s: 1.6.0
    • Component/s: Parsing
    • Labels:
      None

      Description

      If an object or xref starts at same line after the 'endobj' token and the closed object contains a stream, parsing of next object fails.
      Example:
      endstream
      endobj xref
      0 26
      In PDFParser if an object contains a stream the 'endobj' token is read via readLine(). Thus the line break is consumed as well. Now the 'endobj' with following command is handled but only 'xref' is pushed back and not the line break which results in 'xref0' when trying to read next pbject. Thus in this case a simple solution is to push back a space byte before the 'xref'.
      I will add a patch for it.
      Part of the problem can be seen in PDF from http://onlinelibrary.wiley.com/doi/10.1111/j.1399-6576.2009.02134.x/pdf at last 'endobj'. However the last object does not contain a stream and I was not able to produce such a PDF (the PDFs I have containing described problematic construct are unfortunately confidential).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tboehme Timo Boehme
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: