Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.8.0-incubator
-
None
-
0.8.0-incubator as well as checkout from SVN (rev#767932).
Not affected: lastest sf.net release (0.7.3)
Description
PDFTextStripperByArea does not return any text from pages.
This is due to a check in PDFTextStripper#processPage() (first line) that compares the currentPageNo number (initially 0) against the startPage (initially 1). Since PDFTextStripperByArea does not set startPage and/or currentPage, this comparison always gives false and no text is extracted.
A possible fix is to include the following code in PDFTextStripperByArea#extractRegions right before the call to processPage():
setStartPage(0)
setEndPage(0)
Since I'm not very familiar with the inner PDFbox workings, this might be more of a hack than a solid fix.
The issue was introduced in PDFTextStripper 1.70 (old SF.net CSV), where the currentPage++ was removed from just before the check in processPage().