Bug 45556 - [PATCH] poi-3.5-beta1-20080718.jar - content from the foot notes of a 2007 docx document is not extracted.
Summary: [PATCH] poi-3.5-beta1-20080718.jar - content from the foot notes of a 2007 do...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-05 05:45 UTC by xtrim
Modified: 2009-07-18 02:44 UTC (History)
1 user (show)



Attachments
Contains JUnit test class and documents used for testing. (68.67 KB, application/x-zip-compressed)
2008-08-05 05:45 UTC, xtrim
Details
src/scratchpad/testcases/org/apache/poi/hwpf/data/snoska.docx (12.52 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2009-07-14 06:04 UTC, Maxim Valyanskiy
Details
patch (9.71 KB, patch)
2009-07-14 06:06 UTC, Maxim Valyanskiy
Details | Diff
Additinal patch that add text extraction of footnotes in tables (8.34 KB, patch)
2009-07-17 01:12 UTC, Maxim Valyanskiy
Details | Diff
src/scratchpad/testcases/org/apache/poi/hwpf/data/Table.docx (12.87 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2009-07-17 01:12 UTC, Maxim Valyanskiy
Details
XWPFFootnote.java (604 bytes, text/x-java)
2009-07-17 03:46 UTC, Maxim Valyanskiy
Details
Additional patch for endnotes (6.80 KB, patch)
2009-07-17 05:40 UTC, Maxim Valyanskiy
Details | Diff
src/scratchpad/testcases/org/apache/poi/hwpf/data/A Nepalese name for Tilaka.docx (13.24 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2009-07-17 05:41 UTC, Maxim Valyanskiy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-05 05:45:06 UTC
Created attachment 22379 [details]
Contains JUnit test class and documents used for testing.

The text contained in the notes inserted at the end of a page of a word 2007 document is not extracted.
Find in attachments the JUnit test class and the documents used for testing.
We expected to extract the words "testdoc" and "test phrase".

Notes on the attached documents:

- the documents "classic_FootNote.docx" and "form_FootNotes.docx" contain the words "testdoc" and "test phrase" in the notes inserted at the end of a page of the documents.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Maxim Valyanskiy 2009-07-14 06:01:27 UTC
I did create patch that adds text extraction for docx footnotes. Please review my solution, I'm going to add endnotes extraction in the same way.
Comment 2 Maxim Valyanskiy 2009-07-14 06:04:02 UTC
Created attachment 23975 [details]
src/scratchpad/testcases/org/apache/poi/hwpf/data/snoska.docx
Comment 3 Maxim Valyanskiy 2009-07-14 06:06:41 UTC
Created attachment 23976 [details]
patch
Comment 4 Maxim Valyanskiy 2009-07-17 01:12:11 UTC
Created attachment 24000 [details]
Additinal patch that add text extraction of footnotes in tables
Comment 5 Maxim Valyanskiy 2009-07-17 01:12:50 UTC
Created attachment 24001 [details]
src/scratchpad/testcases/org/apache/poi/hwpf/data/Table.docx
Comment 6 Yegor Kozlov 2009-07-17 02:32:34 UTC
Maxim,

XWPFFootnote.java is missing in the patch. Please attach, I'm going to look into it this weekend.

Regards,
Yegor
Comment 7 Maxim Valyanskiy 2009-07-17 03:46:01 UTC
Created attachment 24003 [details]
XWPFFootnote.java

oops :-)
Comment 8 Maxim Valyanskiy 2009-07-17 05:40:48 UTC
Created attachment 24004 [details]
Additional patch for endnotes
Comment 9 Maxim Valyanskiy 2009-07-17 05:41:34 UTC
Created attachment 24005 [details]
src/scratchpad/testcases/org/apache/poi/hwpf/data/A Nepalese name for Tilaka.docx
Comment 10 Yegor Kozlov 2009-07-18 02:44:59 UTC
Patch applied to svn trunk with some minor tweaks. 

Thanks,
Yegor