[PDFBOX-1016] Specification conform xref/trailer parsing + Fix - ASF JIRA

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 1.6.0
Component/s: Parsing
Labels:
None

Description

PDFBOX currently reads xref table/trailer and XRef objects without using startxref or 'Prev' information which results in applying not active data resulting in using wrong objects or resulting in parsing exceptions because old trailer settings do not apply anymore. This happens especially with updated PDF documents where changes are simply appended and old objects/xref entries remain but are not referenced. My last patch (~~PDFBOX-1014~~) tried to solve this for a specific case but it was based on assumptions which do not hold in every case.

The specification compliant way is to read the last startxref which points to the last xref object which itself may reference further xref objects using 'Prev' attribute.

I have written a fix which works the standard way and can fall back to the old behavior in case startxref is wrong or missing. The fix tries to be as unobtrusive as possible. A new class (o.a.p.pdfparser.XrefTrailerResolver) is filled with all xref table/trailer and XRef object data. After document is parsed (and last startxref is read) this class creates xref table and trailer using startxref and 'Prev' information. Beside this new class there are small changes to PDFParser and COSDocument.

This bugfix/improvement should bring PDFBOX a good step closer to be PDF specification conform - especially as long as the new specification conform parser project is not finished.

This bugfix supersedes the fix from ~~PDFBOX-1014~~.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Manage Attachments

XrefTrailerResolver.java
19/May/11 14:15
7 kB
Timo Boehme
COSDocument.diff
19/May/11 14:18
2 kB
Timo Boehme
PDFParser.diff
19/May/11 14:18
5 kB
Timo Boehme
XrefTrailerResolver.java
19/May/11 15:01
7 kB
Timo Boehme
PDFXrefStreamParser.diff
14/Jun/11 07:51
2 kB
Timo Boehme

Issue Links

Add Link

duplicates

PDFBOX-1042 Wrong XRefStream order while parsing incremental updated PDF with XRefStreams

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Andreas Lehmkühler

Reporter:: Timo Boehme

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 19/May/11 14:13

Updated:: 02/Jul/11 16:28

Resolved:: 23/Jun/11 17:19

Time Tracking

Estimated:

10m

Remaining:

10m

Logged:

Specification conform xref/trailer parsing + Fix

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Time Tracking

Agile

Slack

Issue deployment