Bug 45537 - HSLF headers and footers being returned as null
Summary: HSLF headers and footers being returned as null
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: 3.0-dev
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-04 08:06 UTC by xtrim
Modified: 2008-08-05 15:49 UTC (History)
0 users



Attachments
Contains JUnit test class and documents used for testing. (7.15 KB, application/x-zip-compressed)
2008-08-04 08:06 UTC, xtrim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-04 08:06:24 UTC
Created attachment 22361 [details]
Contains JUnit test class and documents used for testing.

The text contained in the header and footer of a power point 2003 document is not extracted.
Find in attachments the JUnit test class and the documents used for testing.
We expected to extract the word "testdoc" and "test phrase".

notes on the attached document:

- the document "Header.ppt" contains the words "testdoc" and "test phrase" in the header.

- the document "Footer.ppt" contains the words "testdoc" and "test phrase" in the footer.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Nick Burch 2008-08-05 09:35:43 UTC
There seems to be a bug in the header/footer support in hslf. For some reason, the headers and footers are showing as null.

I've updated the powerpoint text extractor to include the headers and footers when found, but for now that won't help as they're not coming through

There's a disabled unit test in svn trunk for this, for when someone has a chance to look at powerpoint headers/footers. It's in src/scratchpad/testcases/org/apache/poi/hslf/extractor/TextExtractor.java
Comment 2 Yegor Kozlov 2008-08-05 09:55:17 UTC
I recently implemented usermodel support for headers / footers in HSLF:
http://poi.apache.org/hslf/how-to-shapes.html#HeadersFooters

You may want to leverage it in the text extractor.

Yegor
Comment 3 Nick Burch 2008-08-05 10:05:42 UTC
The problem is that the HeaderFooter object is returning null for the text in cases where I know there to be text there :(
Comment 4 Yegor Kozlov 2008-08-05 10:25:11 UTC
(In reply to comment #3)
> The problem is that the HeaderFooter object is returning null for the text in
> cases where I know there to be text there :(
>

You only check headers / footers for slides. The missing ones are notes headers/footers.

Add the following code and all should be fine: 

if(getNoteText) {
        
        HeadersFooters hd = _show.getNotesHeadersFooters();
        if(hd.isFooterVisible()) {
            ret.append(hd.getFooterText() + "\n");
        }
        if(hd.isHeaderVisible()) {
            ret.append(hd.getHeaderText() + "\n");
        }
}

Yegor
Comment 5 Nick Burch 2008-08-05 15:49:44 UTC
Thanks for that spot Yegor, I hadn't realised that the header/footer could be on notes too :/

Now fixed in svn trunk - slide headers/footers are included by default, and notes headers/footers are included when notes are extracted