Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
-
Operating System: other
Platform: Other
-
28182
Description
Hi All,
I got sick of writing Analyzers, so I have re-worked some of the Analyzer and Filter code by making the
TokenStream an interface (and Tokenizer and TokenFilter). I then created a BaseAnalyzer class that you
set a tokenizer on and you set a list of TokenFilters. The tokenStream() method then applies the
tokenizer and then loops over the list of TokenFilters, applying each one in order and returning the last
one, just as I am sure you have done many a time before. One requirement for this to work is that the
Filters and Tokenizers must allow any state information to be re-initialized through the init() method
on TokenStream.
Also created AbstractTokenizer and AbstractTokenFilter which are trivial implementations of Tokenizer
and TokenFilter respectively. I have made all existing tokenizers and filters backwards compatible.
Let me know if you like or dislike and what changes you would like me to make. I ran all regression
tests and they all worked. I also wrote a TestBaseAnalyzer to test my new Analyzer. See the Test for
usage of the Analyzer. I haven't done a full scale indexing test on it yet, but will soon.