Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
On https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29 it says:
This method of OCR is triggered by the ocrStrategy parameter, but users can manipulate other parameters, including the image type (see org.apache.pdfbox.rendering.ImageType for options) and the dots per inch dpi. The defaults are: gray and 200 respectively.
The stated DPI default here is incorrect. In both tika/tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties and tika/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java the ocrDPI value is set to 300.
This is an immutable wiki page (at least to me) so I can't change it.