Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
With recently modified tika eval dev code that captures exceptions from embedded documents, there are ~30k exceptions in govdocs1 for what we're currently identifying as xls files embedded in ppt and xls files.
It turns out that these are Microsoft Chart files/objects. We are currently identifying them as xls. Let's add mime detection to these embedded objects and see if we can use POI to parse the contents of embedded tables when there are embedded tables.
Attachments
Attachments
Issue Links
- duplicates
-
TIKA-1033 Tika doesn't parse embedded OLE Chart/Graph objects
- Open