Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-152

[PATCH] KStem for Lucene

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

    • Bugzilla Id:
      23782

      Description

      September 10th 2003 contributionn from "Sergio Guzman-Lara" <guzman@cs.umass.edu>

      Original email:

      Hi all,

      I have ported the kstem stemmer to Java and incorporated it to
      Lucene. You can get the source code (Kstem.jar) from the following website:

      http://ciir.cs.umass.edu/downloads/

      Just click on "KStem Java Implementation" (you will need to register
      your e-mail, for free of course, with the CIIR --Center for Intelligent
      Information Retrieval, UMass – and get an access code).

      Content of Kstem.jar:

      java/org/apache/lucene/analysis/KStemData1.java
      java/org/apache/lucene/analysis/KStemData2.java
      java/org/apache/lucene/analysis/KStemData3.java
      java/org/apache/lucene/analysis/KStemData4.java
      java/org/apache/lucene/analysis/KStemData5.java
      java/org/apache/lucene/analysis/KStemData6.java
      java/org/apache/lucene/analysis/KStemData7.java
      java/org/apache/lucene/analysis/KStemData8.java
      java/org/apache/lucene/analysis/KStemFilter.java
      java/org/apache/lucene/analysis/KStemmer.java

      KStemData1.java, ..., KStemData8.java Contain several lists of words
      used by Kstem
      KStemmer.java Implements the Kstem algorithm
      KStemFilter.java Extends TokenFilter applying Kstem

      To compile

      unjar the file Kstem.jar to Lucene's "src" directory, and compile it
      there.

      What is Kstem?

      A stemmer designed by Bob Krovetz (for more information see
      http://ciir.cs.umass.edu/pubfiles/ir-35.pdf).

      Copyright issues

      This is open source. The actual license agreement is included at the
      top of every source file.

      Any comments/questions/suggestions are welcome,

      Sergio Guzman-Lara
      Senior Research Fellow
      CIIR UMass

        Attachments

        1. kstemTestData.zip
          54 kB
          Robert Muir
        2. LUCENE-152_alt.patch
          1 kB
          Robert Muir
        3. LUCENE-152_optimization.patch
          2 kB
          Yonik Seeley
        4. LUCENE-152_optimization.patch
          0.9 kB
          Yonik Seeley
        5. LUCENE-152.patch
          388 kB
          Robert Muir
        6. lucid_kstem.tgz
          650 kB
          Yonik Seeley

          Issue Links

            Activity

              People

              • Assignee:
                rcmuir Robert Muir
                Reporter:
                otis@apache.org Otis Gospodnetic
              • Votes:
                9 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: