Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13356

Language detection per value

    XMLWordPrintableJSON

Details

    Description

      Hello,

      We are using the LangDetect language detection processor with individual field mapping.

      <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
        ...
        <bool name="langid.map">true</bool>
        <bool name="langid.map.individual">true</bool>
      </processor>
      

      If a (simple structured) document is indexed containing different languages in a multivalued field, only one language will be predicted.

      eg:

      <doc>
        <field>This is any text</field>
        <field>Das ist irgendein Text</field>
      </doc>
      

      The result will be either field_en or field_de and both values are mapped into that localized field. In effect some values won't be analyzed properly according to their actual language.

      As enhancement, the detection should be available per value on multivalued fields. So their values can be mapped individually.

      Attachments

        Activity

          People

            Unassigned Unassigned
            toshokanin Marco Remy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: