Resolution: Not A Problem
Affects Version/s: 2.0.6, 2.0.7
Fix Version/s: None
Component/s: Text extraction
Environment:Mac OS x under Eclipse
I am quite unfamiliar with PDFbox. Still, I spent some time trying to figure out to solve the following issue.
There is an issue for the pdf in attachment while extracting its text. Indeed, as you can see the pdf contains the text "Mapping Twitter topic networks: ... " until "... hub and spokes". But the result of PDFTextStripper getText() does not contain any of these characters.
I checked and the community has already fixed similar bugs in the past.
Any help will be delighted.