Created attachment 22361 [details] Contains JUnit test class and documents used for testing. The text contained in the header and footer of a power point 2003 document is not extracted. Find in attachments the JUnit test class and the documents used for testing. We expected to extract the word "testdoc" and "test phrase". notes on the attached document: - the document "Header.ppt" contains the words "testdoc" and "test phrase" in the header. - the document "Footer.ppt" contains the words "testdoc" and "test phrase" in the footer. "TestUnitPoi35Filter.java" is the JUnit class.
There seems to be a bug in the header/footer support in hslf. For some reason, the headers and footers are showing as null. I've updated the powerpoint text extractor to include the headers and footers when found, but for now that won't help as they're not coming through There's a disabled unit test in svn trunk for this, for when someone has a chance to look at powerpoint headers/footers. It's in src/scratchpad/testcases/org/apache/poi/hslf/extractor/TextExtractor.java
I recently implemented usermodel support for headers / footers in HSLF: http://poi.apache.org/hslf/how-to-shapes.html#HeadersFooters You may want to leverage it in the text extractor. Yegor
The problem is that the HeaderFooter object is returning null for the text in cases where I know there to be text there :(
(In reply to comment #3) > The problem is that the HeaderFooter object is returning null for the text in > cases where I know there to be text there :( > You only check headers / footers for slides. The missing ones are notes headers/footers. Add the following code and all should be fine: if(getNoteText) { HeadersFooters hd = _show.getNotesHeadersFooters(); if(hd.isFooterVisible()) { ret.append(hd.getFooterText() + "\n"); } if(hd.isHeaderVisible()) { ret.append(hd.getHeaderText() + "\n"); } } Yegor
Thanks for that spot Yegor, I hadn't realised that the header/footer could be on notes too :/ Now fixed in svn trunk - slide headers/footers are included by default, and notes headers/footers are included when notes are extracted