Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3236

Make LowerCaseFilter and StopFilter keyword aware, similar to PorterStemFilter

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.9, 6.0
    • Component/s: modules/analysis
    • Labels:
    • Environment:

      N/A

    • Lucene Fields:
      New, Patch Available

      Description

      PorterStemFilter has functionality to detect if a term has been marked as a "keyword" by the KeywordMarkerFilter (KeywordAttribute.isKeyword() == true), and if so, skip stemming.

      The suggestion is to have the same functionality in other filters where it is applicable. I think it may be particularly applicable to the LowerCaseFilter (ie if it is a keyword, don't mess with the case), and StopFilter (if it is a keyword, then don't filter it out even if it looks like a stop word).

      Backward compatibility is maintained (in both cases) by adding a new constructor which takes an additional boolean parameter ignoreKeyword. The current constructor will call this new constructor with ignoreKeyword = false.

      Patches are attached (for LowerCaseFilter and StopFilter).

      I have verified that the analysis JUnit tests run against the updated code, ie, backward compatibility is maintained.

        Attachments

        1. lucene-3236-patch.diff
          7 kB
          Sujit Pal
        2. scan.pdf
          39 kB
          Bernhard Kraft

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sujitpal Sujit Pal
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: