Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
While using the PorterStemmer, I found that there were often times that it was far to aggressive in it's stemming. In my particular case it is unrealistic to provide a protected word list which captures all possible words which should not be stemmed. To avoid this I proposed a solution whereby we store the original token as well as the stemmed token so exact searches would always work. Based on discussions on the mailing list Ahmet Arslan, I believe the attached patch to KStemmer provides the desired capabilities through a configuration parameter. This largely is a copy of the org.apache.lucene.wordnet.SynonymTokenFilter.
Attachments
Attachments
Issue Links
- is superceded by
-
LUCENE-4817 Add KeywordRepeaterFilter to emit tokens twice once as keyword and once not as keyword
- Closed