Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.19.1
-
None
-
None
-
Tika CLI
Description
An embedded document (.DOC) having another embedded document (.DOC) does not get extracted correctly by Tika CLI (FileEmbeddedDocumentExtractor)
Only the first level of embedded document opens, any embedded docs within a document given an error in MS Word i.e. "The server application, source file or item cannot be found"
Test file attached :
To test this run the following command, and open the extracted document, you will notice another embedded document within it which does not open.
java -jar tika-app-1.19.1.jar -v -z Recursive-Embedding-test.doc