Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4412

LanguageIdentifier lcmap for language field

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.1
    • 4.4, 6.0
    • contrib - LangId
    • None

    Description

      For some languages, the detector will detect sub-languages, such as LangDetect detecting zh-tw or zh-cn for Chinese. Tika detector only detects zh. Today you can use lcmap to map these two into one code, e.g. langid.map.lcmap=zh-cn:zh zh-tw:zh. But the langField output is not changed.

      We need an option for langField as well.

      Attachments

        1. SOLR-4412.patch
          8 kB
          Jan Høydahl

        Activity

          People

            janhoy Jan Høydahl
            janhoy Jan Høydahl
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: