Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5224

org.apache.lucene.analysis.hunspell.HunspellDictionary should implement ICONV and OCONV lines in the affix file

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.4
    • Fix Version/s: 4.8, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      There are some Hunspell dictionaries that need to emulate Unicode normalization and collation in order to get the correct stem of a word. The original Hunspell provides a way to do this with the ICONV and OCONV lines in the affix file. The Lucene HunspellDictionary ignores these lines right now.

      Please support these keys in the affix file.

      This bit of functionality is briefly described in the hunspell man page http://manpages.ubuntu.com/manpages/lucid/man4/hunspell.4.html

      This functionality is practically required in order to use a Korean dictionary because you want only some of the Jamos of a Hangul character (grapheme cluster) when using stemming. Other languages will find this to be helpful functionality.

      Here is an example for a .aff file:

      ICONV 각 각
      ...
      OCONV 각 각
      

      Here is the same example escaped.

      ICONV \uAC01 \u1100\u1161\u11A8
      ...
      OCONV \u1100\u1161\u11A8 \uAC01
      

        Attachments

        1. LUCENE-5224.patch
          25 kB
          Robert Muir
        2. LUCENE-5224.patch
          23 kB
          Robert Muir
        3. LUCENE-5224.patch
          23 kB
          Robert Muir

          Activity

            People

            • Assignee:
              rcmuir Robert Muir
              Reporter:
              grhoten George Rhoten
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: