Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When looking at the spacing issue found in PDFBOX-77, I found a separate issue with the placement of the diacritic characters in the file 03_2_SSL.pdf which I have attached here.
The issue is that there are separate TextPositions used to render the character itself and its diacritic. For example, the word
And¨ erung, should have its diacritic over the A character and not after the d. This sort of issue occurs when the -sort option is enabled. Otherwise the produced word looks like this,
¨Anderung. This is still not correct in that the A and the diacritic should be merged to take up one character's width of space. This occurs throughout the document.
Currently, PDFBOX does handle merging of diacritic characters but it assumes that the TextPosition for the diacritic occurs after the TextPosition it is supposed to be merged with, when in this file
the diacritic TextPosition comes beforehand.
Attachments
Attachments
Issue Links
- relates to
-
PDFBOX-77 PDF-Extraction splits words by spaces
- Closed