Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2509

TesseractOCRParser ignores configured ImageMagickPath in processImage method

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.16, 1.17
    • Fix Version/s: 1.18
    • Component/s: ocr
    • Labels:
      None

      Description

      The TesseractOCRParser class uses the configured ImageMagickPath in method hasImageMagick to determine whether ImageMagick is present. Ref:
      String ImageMagick = config.getImageMagickPath() + getImageMagickProg();

      BUT then completely ignores the configured path in the processImage method meaning ImageMagick has to be present on system path (so what's the point of the ImageMagickPath config setting).

      The doOCR method on the other hand DOES use the configured tesseractPath.

      Incidentally I notice there is no equivalent PythonPath config setting even though Python is attempted to be found/used.

      Some consistency would be appreciated so that ImageMagick and Python don't have to be present on the system path. i.e. follow the model already in place for finding/using Tesseract.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                davemeikle Dave Meikle
                Reporter:
                Richard Jones Richard Jones
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: