Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6111

Add Chinese Word Segmentation Analyzer with Ansj implementation

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 4.6
    • 4.6
    • modules/analysis
    • New, Patch Available

    Description

      When I use mahout-0.9 depending on lucene-4.6 to run Kmeans clustering algorithm, I find that the default word segmentation analyzer class named 'org.apache.lucene.analysis.standard.StandardAnalyzer' is very ugly, only single word could be splitted.However, ansj Chinese word segmentation tool is widely used in Chinese document-tokenizer, and I am willing to add it to support lucene.

      Attachments

        Activity

          People

            Unassigned Unassigned
            deyinchen deyinchen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified