Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.16
Description
Tika REST server's `/language` resource invokes the relatively heavy `loadModels` operation for every language detect call:
LanguageResource.java
public String detect(final String string) throws IOException { LanguageResult language = new OptimaizeLangDetector().loadModels().detect(string); String detectedLang = language.getLanguage(); LOG.info("Detecting language for incoming resource: [{}]", detectedLang); return detectedLang; }
This could be optimized by (lazy?) loading the models only once and keep them in memory. I assume the `LanguageDetector` is not thread safe, so I expect this requires an ExecutorService with language detectors.
Attachments
Issue Links
- links to