[TIKA-1948] Catch exceptions per page in PDFParser - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.13, 2.0.0
Component/s: None
Labels:
None

Description

In a discussion with tilman somewhere(???), I think he observed that we weren't doing a try/catch for each page. If there's an exception in an early page, it might still be possible to extract text from later pages in a problematic PDF.

With very minimal modifications we could add a try/catch per page, store the caught exceptions, and then throw the first caught exception after the parse finishes.

Attachments

Activity

People

Assignee:: Tim Allison

Reporter:: Tim Allison

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Apr/16 19:43

Updated:: 12/Apr/21 12:59

Resolved:: 13/Apr/16 01:02