Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
Description
A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry Wagner for adapting the Lucene version found here:
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi
Background discussion to this stemmer (including licensing issues) can be found in this thread:
http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295
I've made some minor changes to KStemFilterFactory so that it compiles cleanly against trunk:
1) removed some unnecessary imports
2) changed the init() method parameters introduced by SOLR-215
3) moved KStemFilterFactory into package org.apache.solr.analysis
Once compiled and included in your Solr war (or as a jar in your lib directory, the KStem filter can be used in your schema very easily:
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KStemFilterFactory" cacheSize="20000"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
Attachments
Attachments
Issue Links
- is duplicated by
-
LUCENE-152 [PATCH] KStem for Lucene
- Closed