Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
peterkronenberg on the user/dev lists and on TIKA-3297 and TIKA-3296 has observed that the tesseract error message for "lang data doesn't exist" is not extremely clear. We could add a "preloadLangs" option to TesseractOCRParser (default would be false). If set to true, the parser (upon initialization) if it finds tesseract, will call tesseract --list-langs and then store those langs. At parse time, if the langs set has anything in it, the TesseractOCRParser will check that set against the user-requested language and throw a clearer exception to the user that the language data doesn't exist for the requested language.