[PDFBOX-1042] Wrong XRefStream order while parsing incremental updated PDF with XRefStreams - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Duplicate
Affects Version/s: 1.5.0
Fix Version/s: None
Component/s: Parsing
Labels:
None

Description

A PDF can contain two types of XRef-Entries.
Most files use XRefTables for object references.

Web-Optimized (linearized) pdf document uses XRefStreams. This is a compresed XRefTable as ObjectStream. The PDFParser parse this objects the same way as other objects and put them into an object pool (HashMap). If the document was incremental updated, more XRefStreams would be in the pdf document and all will be put into the object pool.

The XRefStreamParser begin to parse the XRefStreams and try to gain all XRefStream-Object from that pool. The objects returned from the pool aren't in the same order as read. This cause that in some cases the older Object overwrite the newer one. And this cause that the pdfbox can't find the right objects and use the older one instead.

If a user try to parse such a document, he will got an indeterminate state. older and newer objects are mixed.

In my case, a document catalog was overwrote by an old one and i can't see the changes that was made with the incremental update.

A patch and a sample pdf will come soon.

Attachments

Issue Links

is duplicated by

PDFBOX-1016 Specification conform xref/trailer parsing + Fix

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Thomas Chojecki

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 20/Jun/11 16:15

Updated:: 21/Jun/11 14:51

Resolved:: 21/Jun/11 14:51