We should upgrade to Apache POI 3.9, which is the latest version
Added some code similar to the fix to POI-54722 to HSLFExtractor. Uncommented old test. Text is now extracted from tables in HSLF.
ok, as this improvement has been done (3.9 is there), should us close properly this issue and create other one to handle regression effects ?
This bug remains open because some of the text that POI 3.8 used to produce (table text) is not extracted when using POI 3.9. As the issue remains, and the bits of PowerPointParserTest remain commented out, this bug remains open
Strange, I got ready poi 3.9 in dependencies of Tika 1.4.
I've done the upgrade in r1442159, and tweaked a few bits for the HSMF changes
However, I had to disable a couple of the HSLF related .ppt checks, as table text is no longer coming through. I've posted something to the POI list about this, hopefully we can get the fix in and the checks back shortly