Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
1.8.10
-
None
-
None
-
Patch
Description
It's possible that positionX is smaller than previous character's positionX.
Current PDFTextStripper will not add space and extract them to one word.
Added a patch to compare positionX < previous positionX case which can insert a space.
Disabled by default.
Tested with 10002 PDF files which had the negative positionX issue. The result was improved for all of them.
In my test, I set negativeStartXTolerance to 4