Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1336

Add support for lucene's SmartChineseAnalyzer

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese text as words.

      if the factories for the tokenizer and word token filter are added to solr it can be used, although there should be a sample config or wiki entry showing how to apply the built-in stopwords list.
      this is because it doesn't contain actual stopwords, but must be used to prevent indexing punctuation...

      note: we did some refactoring/cleanup on this analyzer recently, so it would be much easier to do this after the next lucene update.
      it has also been moved out of -analyzers.jar due to size, and now builds in its own smartcn jar file, so that would need to be added if this feature is desired.

        Attachments

        1. SOLR-1336.patch
          4 kB
          Robert Muir
        2. SOLR-1336.patch
          45 kB
          Robert Muir
        3. SOLR-1336.patch
          45 kB
          Robert Muir

          Activity

            People

            • Assignee:
              rcmuir Robert Muir
              Reporter:
              rcmuir Robert Muir
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: