Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2520

OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.16
    • 1.19
    • server

    Description

      Tika REST server's `/language` resource invokes the relatively heavy `loadModels` operation for every language detect call:

      LanguageResource.java
      public String detect(final String string) throws IOException {
      	LanguageResult language = new OptimaizeLangDetector().loadModels().detect(string);
      	String detectedLang = language.getLanguage();
      	LOG.info("Detecting language for incoming resource: [{}]", detectedLang);
      	return detectedLang;
      }
      

      This could be optimized by (lazy?) loading the models only once and keep them in memory. I assume the `LanguageDetector` is not thread safe, so I expect this requires an ExecutorService with language detectors.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chrismattmann Chris A. Mattmann
            vandonselaar Vincent van Donselaar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Issue deployment