Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2638

Tika server fails with status 500 if X-Tika-OCRLanguage set to multiple OCR dictionaries

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.18, 1.19
    • Fix Version/s: 1.19.1
    • Component/s: ocr
    • Labels:
      None

      Description

      Tika 1.18 fails with returned status 500 if setting MULTIPLE (delimited by +) dictionaries for Tesseract OCR set by HTTP header like "X-Tika-OCRLanguage: eng+fra"

      Setting a single OCR dictionary works.

      Relevant documentation part from https://wiki.apache.org/tika/TikaOCR

      Overriding the configured language as part of your request

      Different requests may need processing using different language models. These can be specified for specific requests using the X-Tika-OCRLanguage custom header. An example of this is shown below:

      curl T /path/to/tiff/image.jpg http://localhost:9998/tika -header "X-Tika-OCRLanguage: eng"

      Or for multiple languages:

      curl T /path/to/tiff/image.jpg http://localhost:9998/tika -header "X-Tika-OCRLanguage: eng+fra"

        Attachments

          Activity

            People

            • Assignee:
              tallison Tim Allison
              Reporter:
              Mandalka Markus Mandalka
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: