Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.8
-
None
-
None
Description
Dear Apache contributors,
I am a user of pdfbox mainly for the purpose of text extraction. The word ordering is not correct for some cases and the line detection may fail too.
Attachments:
- 1st page: the first letter D is not written before "uis sit amet..." but at the end of the page ;
- 2nd page: the sentence "scolaire ferry" is just before "réouverture du musée" which is wrong because it's not on the same column ;
To manage these cases would be more than welcome A.