[LUCENE-233] [PATCH] analyzer refactoring based on CVS HEAD from 6/21/2004 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: modules/analysis
Labels:
None
Environment:

Operating System: All
Platform: All

Bugzilla Id:
29756

Description

Hello,

As mentioned in previous exchanges, notably with Grant Ingersoll, I added some
new classes to the "analysis" package to meet the requirements of the feature
request in Bugzilla (http://issues.apache.org/bugzilla/show_bug.cgi?id=28182)
and did some refactoring while I was under-the-hood. This is an overview of
the hierarchies per my changes:

-Analyzer
--CustomAnalyzer (new abstract class largely based on Grant's BaseAnalyzer) –
AbstractAnalyzer (new abstract class) ---RussianAnalyzer ---GermanAnalyzer —
etc.

-Tokenizer
--CloneableTokenizer (new abstract class)
---StandardTokenizer
---CharTokenizer
---CJKTokenizer
---etc.

-TokenFilter
--CloneableTokenFilter (new abstract class) ---AbstractStemFilter (new
abstract class) ----RussianStemFilter ----GermanStemFilter ----etc.

-Stemmer (very simple new interface used in AbstractStemFilter) –
PorterStemmer --RussianStemmer --etc.

In the attached zip file there are 3 diff files (core.analysis,
sandbox.analysis, and sandbox.analysis.snowball) and a zip containing the new
classes for org.apache.lucene.analysis in the lucene core. I tried to minimize
the irrelevant code changes (e.g. style, spaces, etc.) in the diffs while
conforming to the code formatting guidelines outlined by Otis. I think there
were a number of classes in the "analysis" package that didn't conform so
these diffs may have a lot of noise as I reformatted those classe with my IDE,
sorry . If the diffs are too painful then let me know and I'll try to prune
them.

If there is a TODO list specific to Analyzers, are the below items on that
list?

1) move German and Russian packages to sandbox (I think this is on the Lucene
TODO list)
2) Analyzer class renaming such that dynamic configuration could return
classes like Analyzer_ru, Analyzer_de, Analyzer_fr, etc. based on the class
naming scheme "Analyzer_

{Locale.toString}

"
3) Documentation

Question, comments, feedback, criticisms are all welcome......

Regards,
RBP

PS - Thanks Grant!

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--analysis.zip
23/Jun/04 17:47
38 kB
Rasik Pandey

Activity

People

Assignee:: Unassigned

Reporter:: Rasik Pandey

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 23/Jun/04 17:45

Updated:: 28/Aug/22 11:17

Resolved:: 01/Jan/08 12:05