Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2520

OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.16
    • 1.19
    • server

    Description

      Tika REST server's `/language` resource invokes the relatively heavy `loadModels` operation for every language detect call:

      LanguageResource.java
      public String detect(final String string) throws IOException {
      	LanguageResult language = new OptimaizeLangDetector().loadModels().detect(string);
      	String detectedLang = language.getLanguage();
      	LOG.info("Detecting language for incoming resource: [{}]", detectedLang);
      	return detectedLang;
      }
      

      This could be optimized by (lazy?) loading the models only once and keep them in memory. I assume the `LanguageDetector` is not thread safe, so I expect this requires an ExecutorService with language detectors.

      Attachments

        Issue Links

          Activity

            People

              chrismattmann Chris A. Mattmann
              vandonselaar Vincent van Donselaar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified