Lucene - Core
  1. Lucene - Core
  2. LUCENE-2522

add simple japanese tokenizer, based on tinysegmenter

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      TinySegmenter (http://www.chasen.org/~taku/software/TinySegmenter/) is a tiny japanese segmenter.

      It was ported to java/lucene by Kohei TAKETA <k-tak@void.in>,
      and is under friendly license terms (BSD, some files explicitly disclaim copyright to the source code, giving a blessing instead)

      Koji knows the author, and already contacted about incorporating into lucene:

      I've contacted Takeda-san who is the creater of Java version of
      TinySegmenter. He said he is happy if his program is part of Lucene.
      He is a co-author of my book about Solr published in Japan, BTW. ;-)
      
      1. LUCENE-2522.patch
        125 kB
        Robert Muir
      2. LUCENE-2522.patch
        94 kB
        Robert Muir
      3. LUCENE-2522.patch
        56 kB
        Robert Muir

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development