Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
When parsing ppt, tika produces for each slide:
<div class="slide">
However for pptx these seem to be missing, all the text is directly under <body>.
Attachments
Issue Links
- is duplicated by
-
TIKA-1841 Different XML output structure for PPT and PPTX
- Open