Affects Version/s: 2.0.8
Fix Version/s: None
Component/s: Text extraction
Dear Apache contributors,
I am a user of pdfbox mainly for the purpose of text extraction. The word ordering is not correct for some cases and the line detection may fail too.
- 1st page: the first letter D is not written before "uis sit amet..." but at the end of the page ;
- 2nd page: the sentence "scolaire ferry" is just before "réouverture du musée" which is wrong because it's not on the same column ;
To manage these cases would be more than welcome A.