Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-210

[PATCH] Never write an Analyzer again

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • modules/analysis
    • None
    • Operating System: other
      Platform: Other

    • 28182

    Description

      Hi All,

      I got sick of writing Analyzers, so I have re-worked some of the Analyzer and Filter code by making the
      TokenStream an interface (and Tokenizer and TokenFilter). I then created a BaseAnalyzer class that you
      set a tokenizer on and you set a list of TokenFilters. The tokenStream() method then applies the
      tokenizer and then loops over the list of TokenFilters, applying each one in order and returning the last
      one, just as I am sure you have done many a time before. One requirement for this to work is that the
      Filters and Tokenizers must allow any state information to be re-initialized through the init() method
      on TokenStream.

      Also created AbstractTokenizer and AbstractTokenFilter which are trivial implementations of Tokenizer
      and TokenFilter respectively. I have made all existing tokenizers and filters backwards compatible.

      Let me know if you like or dislike and what changes you would like me to make. I ran all regression
      tests and they all worked. I also wrote a TestBaseAnalyzer to test my new Analyzer. See the Test for
      usage of the Analyzer. I haven't done a full scale indexing test on it yet, but will soon.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--analyzer.tar.gz
          2 kB
          Grant Ingersoll
        2. ASF.LICENSE.NOT.GRANTED--analyzer.patch
          25 kB
          Grant Ingersoll
        3. ASF.LICENSE.NOT.GRANTED--analysis.zip
          38 kB
          Otis Gospodnetic

        Activity

          People

            Unassigned Unassigned
            grant_ingersoll@yahoo.com Grant Ingersoll
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: