Lucene - Core
  1. Lucene - Core
  2. LUCENE-3921

Add decompose compound Japanese Katakana token capability to Kuromoji

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
    • Environment:

      Cent OS 5, IPA Dictionary, Run with "Search mdoe"

    • Lucene Fields:
      New

      Description

      Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, "トートバッグ(tote bag)" and "ショルダーバッグ" don't decompose into "トート バッグ" and "ショルダー バッグ" although the IPA dictionary has "バッグ" in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens.

        Activity

        Kazuaki Hiraga created issue -
        Kazuaki Hiraga made changes -
        Field Original Value New Value
        Environment Cent OS 5, IPA Dictionary Cent OS 5, IPA Dictionary, Run with "Search mdoe"

          People

          • Assignee:
            Unassigned
            Reporter:
            Kazuaki Hiraga
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development