Description
When default values were added for averageCharTolerance andĀ spacingTolerance as a part of TIKA-3091, their values appear to have been inadvertently swapped.
From PDFBox:
private float spacingTolerance = .5f; private float averageCharTolerance = .3f;
From tika 1.24.1:
//The character width-based tolerance value used to estimate where spaces in text should be added //Default taken from PDFBox. private Float averageCharTolerance = 0.5f; //The space width-based tolerance value used to estimate where spaces in text should be added //Default taken from PDFBox. private Float spacingTolerance = 0.3f;
This effective change in defaults has caused PDFParser to start adding more spaces than it did in 1.24 and earlier.
Attachments
Issue Links
- relates to
-
TIKA-3091 java.lang.NullPointerException when calling hashCode after instantiating PDFParserConfig
- Resolved