Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8553

New KoreanDecomposeFilter for KoreanAnalyzer(Nori)

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • modules/analysis
    • None
    • New

    Description

      This is a patch for KoreanDecomposeFilter.

      This filter can be used to decompose Hangul.
      (ex) 한글 -> ㅎㄱ or ㅎㅏㄴㄱㅡㄹ)

      Hangul input is very unique.

      If you want to type apple in English,
         you can type it in the order a -> p -> p -> l -> e.

      However, if you want to input "Hangul" in Hangul,
         you have to type it in the order of ㅎ -> ㅏ -> ㄴ -> ㄱ -> ㅡ -> ㄹ.
         (Because of the keyboard shape)

      This means that spell check with existing full Hangul can be less accurate.

       

      The structure of Hangul consists of elements such as "Choseong", "Jungseong", and "Jongseong".

      These three elements are called "Jamo".

      If you have the Korean word "된장찌개" (that means Soybean Paste Stew)
      "Choseong" means "ㄷ, ㅈ, ㅉ, ㄱ",
      "Jungseong" means "ㅚ, ㅏ, ㅣ, ㅐ",
      "Jongseong" means "ㄴ, ㅇ".

      The reason for Jamo separation is explained above. (spell check)

      Also, the reason we need "Choseong Filter" is because many Koreans use "Choseong Search" (especially in mobile environment).
      If you want to search for "된장찌개" you need 10 typing, which is quite a lot.
      For that reason, I think it would be useful to provide a filter that can be searched by "ㄷㅈㅉㄱ".

      Hangul also has dual chars, such as
      "ㄲ, ㄸ, ㅁ, ㅃ, ㅉ, ㅚ (ㅗ + ㅣ), ㅢ (ㅡ + ㅣ), ...".

      For such reasons,
      KoreanDecompose offers 5 options,

      ex) 된장찌개 => [된장][찌개]

      1) ORIGIN
      [된장], [찌개]

      2) SINGLECHOSEONG
      [ㄷㅈ], [ㅉㄱ]

      3) DUALCHOSEONG
      [ㄷㅈ], [ㅈㅈㄱ]

      4) SINGLEJAMO
      [ㄷㅚㄴㅈㅏㅇ], [ㅉㅣㄱㅐ]

      5) DUALJAMO
      [ㄷㅗㅣㄴㅈㅏㅇ], [ㅈㅈㅣㄱㅐ]

       

      Attachments

        1. LUCENE-8553.patch
          29 kB
          Namgyu Kim

        Activity

          People

            Unassigned Unassigned
            danmuzi Namgyu Kim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: