Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
The current framework for handling term normalisation works via instanceof checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself deals in AbstractAnalysisComponents, and so callers need to cast to the correct component type before use, which is ripe for misuse.
We should re-organise all this to be type-safe and usable without casts. One possibility is to add `normalize` methods to CharFilterFactory and TokenFilterFactory that mirror their existing `create` methods. The default implementation would return the input unchanged, while filters that should apply at normalization time can delegate to `create`.
Related to this, we should deprecate and remove LowerCaseTokenizer, which combines tokenization and normalization in a way that will break this API.
Attachments
Attachments
Issue Links
- relates to
-
LUCENE-8731 mark MultiTermAwareComponent as deprecated (7.x and 7.7 only)
- Closed
- links to