getFootnoteText can fail with different exceptions caused by incorrect range calculations for empty footnote block: java.lang.ArrayIndexOutOfBoundsException: 60 at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:46) at org.apache.poi.hwpf.sprm.SprmOperation.<init>(SprmOperation.java:54) at org.apache.poi.hwpf.sprm.SprmIterator.next(SprmIterator.java:45) at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:58) at org.apache.poi.hwpf.model.PAPX.getParagraphProperties(PAPX.java:130) at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:819) at org.apache.poi.hwpf.extractor.WordExtractor.getParagraphText(WordExtractor.java:140) at org.apache.poi.hwpf.extractor.WordExtractor.getFootnoteText(WordExtractor.java:121) or java.lang.IllegalArgumentException: The end (1062) must not be before the start (2029) at org.apache.poi.hwpf.usermodel.Range.sanityCheckStartEnd(Range.java:244) at org.apache.poi.hwpf.usermodel.Range.<init>(Range.java:178) at org.apache.poi.hwpf.usermodel.Paragraph.<init>(Paragraph.java:98) at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:827) at org.apache.poi.hwpf.extractor.WordExtractor.getParagraphText(WordExtractor.java:139) at org.apache.poi.hwpf.extractor.WordExtractor.getFootnoteText(WordExtractor.java:120)
Patch does two things: 1) Adds footnote/endnote extraction to WordExtractor.getText(). This breaks TestWordExtractor since POI test suite already contains file with this problem 2) Fixes bug in Range.findRange (this also fixes TestWordExtractor test case)
Created attachment 23987 [details] patch
Patch applied in r795333 Thanks, Yegor