Solr
  1. Solr
  2. SOLR-3231

Add the ability to KStemmer to preserve the original token when stemming

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 5.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      While using the PorterStemmer, I found that there were often times that it was far to aggressive in it's stemming. In my particular case it is unrealistic to provide a protected word list which captures all possible words which should not be stemmed. To avoid this I proposed a solution whereby we store the original token as well as the stemmed token so exact searches would always work. Based on discussions on the mailing list Ahmet Arslan, I believe the attached patch to KStemmer provides the desired capabilities through a configuration parameter. This largely is a copy of the org.apache.lucene.wordnet.SynonymTokenFilter.

      1. KStemFilter.patch
        4 kB
        Jamie Johnson

        Issue Links

          Activity

          Jamie Johnson created issue -
          Jamie Johnson made changes -
          Field Original Value New Value
          Attachment KStemFilter.patch [ 12517873 ]
          Jamie Johnson made changes -
          Attachment KStemFilter.patch [ 12517873 ]
          Jamie Johnson made changes -
          Attachment KStemFilter.patch [ 12517874 ]
          Mark Miller made changes -
          Fix Version/s 4.0 [ 12314992 ]
          Affects Version/s 4.0 [ 12314992 ]
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12314992 ]
          Mark Miller made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.1 [ 12321141 ]
          Simon Willnauer made changes -
          Link This issue is superceded by LUCENE-4817 [ LUCENE-4817 ]
          Simon Willnauer made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Erick Erickson made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 4.2 [ 12323893 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Jamie Johnson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development