Description
When an embedded document is parsed and causes an exception, we're currently catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the default) or reporting it in the RecursiveParserWrapper by storing the stacktrace in the Metadata of the embedded document.
However, if there's an exception during detection on the embedded stream or on getting the stream before the stream hits the parser, we aren't handling that uniformly or robustly across parsers.
Attachments
Issue Links
- is depended upon by
-
TIKA-2204 IndexOutOfBoundsException on a valid Powerpoint file
- Resolved
-
TIKA-2215 TikaException about "Invalid embedded resource" on a valid PPT file
- Resolved
- is related to
-
TIKA-2130 TaggedIOException from ZipException on a valid PowerPoint file
- Resolved
-
TIKA-2157 HSLFException on a valid Powerpoint file
- Resolved
-
TIKA-2161 EOFException on a valid Powerpoint file
- Resolved
-
TIKA-2164 HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file
- Resolved