Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7342

WordDelimiterFilter should observe KeywordAttribute to pass these tokens through

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • modules/analysis
    • None
    • New

    Description

      I have a text analysis requirement in which I want certain tokens to not be processed by WordDelimiterFilter – i.e. they should pass through that filter. WDF, like several other TokenFilters, has a configurable word list but this list is static producing a concrete CharArraySet. Thus, for example, I can't filter by a regexp nor can I filter based on other attributes.

      A simple solution that makes sense to me is to have WDF use KeywordAttribute to know if it should skip the token. KeywordAttribute seems fairly generic as to how it can be used, although granted today it's only used by the stemmers. That attribute isn't named "StemmerIgnoreAttribute" or some-such; it's generic so I think it's fine for WDF to use it in a similar way.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dsmiley David Smiley
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: