Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1042

Wrong XRefStream order while parsing incremental updated PDF with XRefStreams

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: Parsing
    • Labels:
      None

      Description

      A PDF can contain two types of XRef-Entries.
      Most files use XRefTables for object references.

      Web-Optimized (linearized) pdf document uses XRefStreams. This is a compresed XRefTable as ObjectStream. The PDFParser parse this objects the same way as other objects and put them into an object pool (HashMap). If the document was incremental updated, more XRefStreams would be in the pdf document and all will be put into the object pool.

      The XRefStreamParser begin to parse the XRefStreams and try to gain all XRefStream-Object from that pool. The objects returned from the pool aren't in the same order as read. This cause that in some cases the older Object overwrite the newer one. And this cause that the pdfbox can't find the right objects and use the older one instead.

      If a user try to parse such a document, he will got an indeterminate state. older and newer objects are mixed.

      In my case, a document catalog was overwrote by an old one and i can't see the changes that was made with the incremental update.

      A patch and a sample pdf will come soon.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tchojecki Thomas Chojecki
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: