[LUCENE-2055] Fix buggy stemmers and Remove duplicate analysis functionality - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.9.4, 3.0.3, 3.1, 4.0-ALPHA
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

would like to remove stemmers in the following packages, and instead in their analyzers use a SnowballStemFilter instead.

analyzers/fr
analyzers/nl
analyzers/ru

below are excerpts from this code where they proudly proclaim they use the snowball algorithm.
I think we should delete all of this custom stemming code in favor of the actual snowball package.

/**
 * A stemmer for French words. 
 * <p>
 * The algorithm is based on the work of
 * Dr Martin Porter on his snowball project<br>
 * refer to http://snowball.sourceforge.net/french/stemmer.html<br>
 * (French stemming algorithm) for details
 * </p>
 */

public class FrenchStemmer {

/**
 * A stemmer for Dutch words. 
 * <p>
 * The algorithm is an implementation of
 * the <a href="http://snowball.tartarus.org/algorithms/dutch/stemmer.html">dutch stemming</a>
 * algorithm in Martin Porter's snowball project.
 * </p>
 */
public class DutchStemmer {

/**
 * Russian stemming algorithm implementation (see http://snowball.sourceforge.net for detailed description).
 */
class RussianStemmer

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2055.patch
01/Feb/10 09:22
163 kB
Robert Muir
LUCENE-2055.patch
01/Feb/10 12:46
163 kB
Robert Muir
LUCENE-2055.patch
03/Feb/10 14:24
169 kB
Robert Muir
LUCENE-2055.patch
03/Feb/10 17:49
168 kB
Robert Muir
LUCENE-2055.patch
04/Feb/10 20:11
179 kB
Robert Muir

Issue Links

depends upon

LUCENE-2206 integrate snowball stopword lists

Closed

LUCENE-2198 support protected words in Stemming TokenFilters

Closed

Activity

People

Assignee:: Robert Muir

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 11/Nov/09 15:42

Updated:: 28/Aug/22 12:13

Resolved:: 29/Oct/10 14:51