Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.11
-
None
-
None
Description
I have the text "10" and "11" and they get merged into to "1110" word.
Coordinates are:
1 575.36 x 227.4 w 4.447998 h 5.736
1 579.752 x 227.4 w 4.447998 h 5.736
1 526.2 x 227.4 w 4.447998 h 5.736
0 530.59204 x 227.4 w 4.447998 h 5.736
The bug is in in this PDFTextStripper chunk:
{{
// test if our TextPosition starts after a new word would be expected to start
if (expectedStartOfNextWordX != EXPECTED_START_OF_NEXT_WORD_X_RESET_VALUE
&& expectedStartOfNextWordX < positionX &&
// only bother adding a space if the last character was not a space
lastPosition.getTextPosition().getUnicode() != null
&& !lastPosition.getTextPosition().getUnicode().endsWith(" "))
}}
which seems to add a word separator only if the next char is "after" the current word. It never expects that the next char might be "before" the current word.
I guess this could also be framed as a RTL problem, but the PDF is a plain PDF, it just seems that Oracle Reports generates these chunks in the reverse order.