[TIKA-2159] Handle pre-parse embedded object exceptions uniformly and more robustly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.15, 2.0.0
Component/s: parser
Labels:
None

Description

When an embedded document is parsed and causes an exception, we're currently catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the default) or reporting it in the RecursiveParserWrapper by storing the stacktrace in the Metadata of the embedded document.

However, if there's an exception during detection on the embedded stream or on getting the stream before the stream hits the parser, we aren't handling that uniformly or robustly across parsers.

Attachments

Issue Links

is depended upon by

TIKA-2204 IndexOutOfBoundsException on a valid Powerpoint file

Resolved

TIKA-2215 TikaException about "Invalid embedded resource" on a valid PPT file

Resolved

is related to

TIKA-2130 TaggedIOException from ZipException on a valid PowerPoint file

Resolved

TIKA-2157 HSLFException on a valid Powerpoint file

Resolved

TIKA-2161 EOFException on a valid Powerpoint file

Resolved

TIKA-2164 HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file

Resolved

(1 is related to)

Activity

People

Assignee:: Unassigned

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Nov/16 13:18

Updated:: 12/Apr/21 13:00

Resolved:: 13/Jan/17 12:33