Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2522

add simple japanese tokenizer, based on tinysegmenter

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      TinySegmenter (http://www.chasen.org/~taku/software/TinySegmenter/) is a tiny japanese segmenter.

      It was ported to java/lucene by Kohei TAKETA <k-tak@void.in>,
      and is under friendly license terms (BSD, some files explicitly disclaim copyright to the source code, giving a blessing instead)

      Koji knows the author, and already contacted about incorporating into lucene:

      I've contacted Takeda-san who is the creater of Java version of
      TinySegmenter. He said he is happy if his program is part of Lucene.
      He is a co-author of my book about Solr published in Japan, BTW. ;-)
      

        Attachments

        1. LUCENE-2522.patch
          125 kB
          Robert Muir
        2. LUCENE-2522.patch
          94 kB
          Robert Muir
        3. LUCENE-2522.patch
          56 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: