[SOLR-1336] Add support for lucene's SmartChineseAnalyzer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.1, 4.0-ALPHA
Component/s: Schema and Analysis
Labels:
None

Description

SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese text as words.

if the factories for the tokenizer and word token filter are added to solr it can be used, although there should be a sample config or wiki entry showing how to apply the built-in stopwords list.
this is because it doesn't contain actual stopwords, but must be used to prevent indexing punctuation...

note: we did some refactoring/cleanup on this analyzer recently, so it would be much easier to do this after the next lucene update.
it has also been moved out of -analyzers.jar due to size, and now builds in its own smartcn jar file, so that would need to be added if this feature is desired.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-1336.patch
03/Sep/09 20:44
45 kB
Robert Muir
SOLR-1336.patch
09/Aug/09 01:22
45 kB
Robert Muir
SOLR-1336.patch
08/Aug/09 21:28
4 kB
Robert Muir

Activity

People

Assignee:: Robert Muir

Reporter:: Robert Muir

Votes:: 2 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Aug/09 10:36

Updated:: 30/Mar/11 15:46

Resolved:: 02/Nov/10 15:03