Details
-
Improvement
-
Status: Reopened
-
Trivial
-
Resolution: Unresolved
-
None
-
None
Description
The motivation: support embedded files in PDF, Word's doc/docx, etc.
according to https://stackoverflow.com/questions/20172465/get-embedded-resourses-in-doc-files-using-apache-tika, it is possible to recursively parse a document and save its sub-items (e.g. images) in a folder thanks to FileEmbeddedDocumentExtractor. However, the scope of the above class is only in the TikaCLI.
I think it should be visible to the applications that uses Tika (not only to the CLI)
Attachments
Issue Links
- links to