Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2522

add simple japanese tokenizer, based on tinysegmenter

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • 4.9, 6.0
    • modules/analysis
    • None
    • New, Patch Available

    Description

      TinySegmenter (http://www.chasen.org/~taku/software/TinySegmenter/) is a tiny japanese segmenter.

      It was ported to java/lucene by Kohei TAKETA <k-tak@void.in>,
      and is under friendly license terms (BSD, some files explicitly disclaim copyright to the source code, giving a blessing instead)

      Koji knows the author, and already contacted about incorporating into lucene:

      I've contacted Takeda-san who is the creater of Java version of
      TinySegmenter. He said he is happy if his program is part of Lucene.
      He is a co-author of my book about Solr published in Japan, BTW. ;-)
      

      Attachments

        1. LUCENE-2522.patch
          56 kB
          Robert Muir
        2. LUCENE-2522.patch
          94 kB
          Robert Muir
        3. LUCENE-2522.patch
          125 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: