Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
2.0.1, 2.0.18
-
None
-
None
Description
TextExtractor mishandles typographic ligatures. I've attached test documents from both Microsoft Word and LibreOffice.
I've checked PDFBox's output against xPDF on CentOS, and the ligatures are properly handled with that utililty, so it appears that this is a PDFBox defect.