Description
For the attached docx, tika seems to detect the embedded object, as shown by this tag:
<div class="embedded" id="rId10"/>
However, extraction itself (using -z on the command line, or using the API) does not seem to work for this object:
java -jar tika-app-1.4.jar -z Doc\ w\ Structure\ that\ wont\ extract.docx
Extracting 'rId9_image1.wmf' (application/x-msmetafile) to /tmp/tika/rId9_image1.wmf
Attachments
Attachments
Issue Links
- is related to
-
TIKA-1072 AIOOBE when handling embedded document in .doc file
- Resolved