Details
-
Improvement
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
1.14
-
None
Description
The Office format parsers support including or excluding of deleted text and moved text. It would be good to also support something similar for shape-based text, though probably not for PPT / PPTX as that's almost all shape-based!
(This has been done hackily in the Alfresco fork of Tika at https://github.com/Alfresco/tika/commit/32aca3fd96816ad49b869a82c9ba0f02265f8744 but would be good to do properly)