Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3271

Change default image resize size in TesseractParser's pre-processing step

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      If users have ImageMagick installed and they select image preprocessing, one of the things we are currently doing is telling ImageMagick to expand the image by 900%. This may make sense for small images..tbd...however, this can lead to massive files and dramatic increases in processing time.

      At some point, we should probably increase the image size based on the initial image size, e.g. dynamic resizing.

      Until then, for Tika 2.0.0, I propose that we change the default to 200%. This value is completely heuristic and not based on much data aside from Peter Kronenberg's work: https://lists.apache.org/thread.html/rb1dece05760d10f1b165b03b97fef8b609dc40c4cd06bdb8cc36469d%40%3Cuser.tika.apache.org%3E

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: