Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3236

Make LowerCaseFilter and StopFilter keyword aware, similar to PorterStemFilter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 4.0-ALPHA
    • 4.9, 6.0
    • modules/analysis
    • N/A

    • New, Patch Available

    Description

      PorterStemFilter has functionality to detect if a term has been marked as a "keyword" by the KeywordMarkerFilter (KeywordAttribute.isKeyword() == true), and if so, skip stemming.

      The suggestion is to have the same functionality in other filters where it is applicable. I think it may be particularly applicable to the LowerCaseFilter (ie if it is a keyword, don't mess with the case), and StopFilter (if it is a keyword, then don't filter it out even if it looks like a stop word).

      Backward compatibility is maintained (in both cases) by adding a new constructor which takes an additional boolean parameter ignoreKeyword. The current constructor will call this new constructor with ignoreKeyword = false.

      Patches are attached (for LowerCaseFilter and StopFilter).

      I have verified that the analysis JUnit tests run against the updated code, ie, backward compatibility is maintained.

      Attachments

        1. lucene-3236-patch.diff
          7 kB
          Sujit Pal
        2. scan.pdf
          39 kB
          Bernhard Kraft

        Activity

          People

            Unassigned Unassigned
            sujitpal Sujit Pal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: