[PDFBOX-720] Inconsistency in parsing PDFs between Windows and Linux - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Component/s: Parsing
Labels:
None
Environment:
Windows Vista 32-bit, Sun JDK 1.5.0_06, PDFBox HEAD tag (revision 941073)
vs.
Red Hat Linux, 2.6.9-67.ELsmp kernel, Java 1.5.0_06, PDFBox HEAD tag (revision 941073)

Description

Run this same code using the same PDF and you'll get different results on Linux than on Windows. Regardless of which one you consider "correct", it should be consistent.

doc = PDDocument.load(inputFile);
PDDocumentOutline outline = doc.getDocumentCatalog().getDocumentOutline();
if(outline == null)
System.out.println("Document outline was null");
else
System.out.println("Document outline was not null");

Some interesting notes about this PDF: Seems that Acrobat Distiller 8.1.0 basically just concatenated two PDFs into one. There are two trailers, they both refer to object "1600 0" as the root. 1600 0 appears multiple times, one time it doesn't have "Outlines" in the dictionary, the other time it has "Outlines 1667 0". Windows picks up the latter and shows the outline correctly. Linux picks up the former and thus returns null for the outline. I tried debugging through PDFParser and BaseParser, but I'm not really sure how that code works and I quickly got lost.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

238_Page_Report.pdf
04/May/10 22:57
4.57 MB
Adam Nichols

Activity

People

Assignee:: Unassigned

Reporter:: Adam Nichols

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/May/10 22:54

Updated:: 06/Jan/15 07:44

Resolved:: 10/Oct/14 21:15