Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
Hello,
We are using the LangDetect language detection processor with individual field mapping.
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory"> ... <bool name="langid.map">true</bool> <bool name="langid.map.individual">true</bool> </processor>
If a (simple structured) document is indexed containing different languages in a multivalued field, only one language will be predicted.
eg:
<doc> <field>This is any text</field> <field>Das ist irgendein Text</field> </doc>
The result will be either field_en or field_de and both values are mapped into that localized field. In effect some values won't be analyzed properly according to their actual language.
As enhancement, the detection should be available per value on multivalued fields. So their values can be mapped individually.