Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-379

KStem Token Filter

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • search
    • None

    Description

      A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry Wagner for adapting the Lucene version found here:
      http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi

      Background discussion to this stemmer (including licensing issues) can be found in this thread:
      http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295

      I've made some minor changes to KStemFilterFactory so that it compiles cleanly against trunk:
      1) removed some unnecessary imports
      2) changed the init() method parameters introduced by SOLR-215
      3) moved KStemFilterFactory into package org.apache.solr.analysis

      Once compiled and included in your Solr war (or as a jar in your lib directory, the KStem filter can be used in your schema very easily:

      <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.KStemFilterFactory" cacheSize="20000"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

      Attachments

        1. KStemSolr.zip
          118 kB
          Pieter Berkel

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pberkel Pieter Berkel
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: