Solr
  1. Solr
  2. SOLR-4412

LanguageIdentifier lcmap for language field

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.4, Trunk
    • Component/s: contrib - LangId
    • Labels:
      None

      Description

      For some languages, the detector will detect sub-languages, such as LangDetect detecting zh-tw or zh-cn for Chinese. Tika detector only detects zh. Today you can use lcmap to map these two into one code, e.g. langid.map.lcmap=zh-cn:zh zh-tw:zh. But the langField output is not changed.

      We need an option for langField as well.

      1. SOLR-4412.patch
        8 kB
        Jan Høydahl

        Activity

        Steve Rowe made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Jan Høydahl made changes -
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Jan Høydahl made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Jan Høydahl made changes -
        Fix Version/s 5.0 [ 12321664 ]
        Jan Høydahl made changes -
        Comment [ Commit 1498951 from janhoy@apache.org
        [ https://svn.apache.org/r1498951 ]

        SOLR-4412: Added comments about variant to schema.xml (merge from trunk) ]
        Jan Høydahl made changes -
        Comment [ Commit 1498948 from janhoy@apache.org
        [ https://svn.apache.org/r1498948 ]

        SOLR-4412: Added comments about variant to schema.xml ]
        Uwe Schindler made changes -
        Fix Version/s 4.4 [ 12324324 ]
        Fix Version/s 4.3 [ 12324128 ]
        Jan Høydahl made changes -
        Assignee Jan Høydahl [ janhoy ]
        Jan Høydahl made changes -
        Attachment SOLR-4412.patch [ 12573301 ]
        Robert Muir made changes -
        Fix Version/s 4.3 [ 12324128 ]
        Fix Version/s 5.0 [ 12321664 ]
        Fix Version/s 4.2 [ 12323893 ]
        Jan Høydahl made changes -
        Field Original Value New Value
        Description For some languages, the detector will detect sub-languages, such as LangDetect detecting zh-tw or zh-cn for Chinese. Tika detector only detects zh. Today you can use {{{lcmap}}} to map these two into one code, e.g. {{{langid.map.lcmap=zh-cn:zh zh-tw:zh}}}. But the {{{langField}}} output is not changed.

        We need an option for {{{langField}}} as well.
        For some languages, the detector will detect sub-languages, such as LangDetect detecting zh-tw or zh-cn for Chinese. Tika detector only detects zh. Today you can use {{lcmap}} to map these two into one code, e.g. {{langid.map.lcmap=zh-cn:zh zh-tw:zh}}. But the {{langField}} output is not changed.

        We need an option for {{langField}} as well.
        Jan Høydahl created issue -

          People

          • Assignee:
            Jan Høydahl
            Reporter:
            Jan Høydahl
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development