Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11773

configurable language config for tesseract ocr

    XMLWordPrintableJSON

Details

    Description

      Currently to change the language for tesseract I have to manipulate the \org\apache\tika\parser\ocr\TesseractOCRConfig.properties in tika-parsers-1.16.jar.

      There is no possibility to set the language in solrconfig.xml or on each request to the ExtractingRequestHandler.

      If someone has documents with different languages its impossible to configure this. Tesseract will not work as good as it could with correct set language.

      Attachments

        Activity

          People

            Unassigned Unassigned
            advokat Advokat
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: