Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2231

Invalid language code exception

    Details

      Description

      There is a regex in TesseractOCRConfig.setLanguage(String language) which attempts to validate the language being set. Unfortunately it does not allow you to set some languages that are valid for tesseract.

      For example:

      TesseractOCRConfig config = new TesseractOCRConfig();
      config.setLanguage("chi_tra");

      This throws an IllegalArgumentException because of the '_' in the language name. "chi_tra" is a valid tesseract language code.

      Need to update the regex to allow '_' character.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pmweiss5 Peter Weiss
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 1h
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified