Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.0.25, 3.0.0 PDFBox
-
None
-
None
Description
When two texts are partially overlapping PDFTextStripper seems to return a mix simply based on "leftmost x coordinate of the glyph", which makes sense, but it could make use of glyph size to disambiguate "easy" cases like this one:
currently this is the first parameter of PDFTextStripper.writeString(String string, List<TextPosition> textPositions):
"T0510E09620_S368b3aT92-29fa -4Leef-80I5e-N53c23efE7979f"
I would of course hope for two calls:
"TEST LINE"
"051009620_368b3a92-29fa-4eef-805e-53c23ef7979f"