Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2586

PDFParser documentation has incorrect DPI default

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • None
    • documentation
    • None

    Description

      On https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29 it says:

      This method of OCR is triggered by the ocrStrategy parameter, but users can manipulate other parameters, including the image type (see org.apache.pdfbox.rendering.ImageType for options) and the dots per inch dpi. The defaults are: gray and 200 respectively.

      The stated DPI default here is incorrect.  In both tika/tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties and tika/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java the ocrDPI value is set to 300.

      This is an immutable wiki page (at least to me) so I can't change it.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ewanmellor-2 Ewan Mellor
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: