Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3231

Add the ability to KStemmer to preserve the original token when stemming

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.3, 6.0
    • Schema and Analysis
    • None

    Description

      While using the PorterStemmer, I found that there were often times that it was far to aggressive in it's stemming. In my particular case it is unrealistic to provide a protected word list which captures all possible words which should not be stemmed. To avoid this I proposed a solution whereby we store the original token as well as the stemmed token so exact searches would always work. Based on discussions on the mailing list Ahmet Arslan, I believe the attached patch to KStemmer provides the desired capabilities through a configuration parameter. This largely is a copy of the org.apache.lucene.wordnet.SynonymTokenFilter.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            jej2003 Jamie Johnson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment