Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Not A Problem
-
2.0.4
-
None
-
None
-
Windows 10 64-bit
Description
We overwrote the void writeString(String text, List<TextPosition> textPositions) method of the PDFTextStripper to extract additional position and style information from the PDFs. We thought this method would be called per line and the elements of the parameter List<TextPosition> textPositions would be all the letters, including the spaces in a line.
This is indeed the case for thousands of the documents. However, one particular document, this is not the case and the textPositions contains just the letters of a word and writeString is called per word.
I am not sure if this would be counted as a bug because the final extracted text is not affected.
The problematic PDF is attached.