[PDFBOX-5405] "Page tree root must be a dictionary" when attempting to parse pdf - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Not A Bug
Affects Version/s: 2.0.25
Fix Version/s: None
Component/s: Parsing
Labels:
None

Description

Hi,

I have a PDF file that throws the following error when I try to parse it:

Caused by: java.io.IOException: Page tree root must be a dictionary
    at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1202)
    at org.apache.tika.parser.pdf.PDFParser.getPDDocument(PDFParser.java:191)
    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:149)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
    ... 5 more

I have attached the file in question with this issue.

Might be related to ~~PDFBOX-4915~~

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Grafiska riktlinjer, fordon LRV.pdf
30/Mar/22 14:15
480 kB
Johannes Wirkkala Westlund

Activity

People

Assignee:: Unassigned

Reporter:: Johannes Wirkkala Westlund

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Mar/22 14:15

Updated:: 03/Apr/22 18:28

Resolved:: 03/Apr/22 18:28