Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2584

Tika should have a way to pass arbitrary Tesseract options

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.17
    • 1.18, 2.0.0
    • parser
    • None

    Description

      Tesseract has a very large number of config options (use tesseract --print-parameters to see them).  There is no mechanism for TesseractOCRParser / TesseractOCRConfig to pass these to Tesseract, and so they cannot be controlled by user code.

      Tika should pass these through as opaque key-value pairs, so that user code can set them as necessary.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ewanmellor-2 Ewan Mellor
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: