Each built-in analysis component (factory of tokenizer / char filter / token filter) has a SPI name but currently this is not documented anywhere.
The goals of this issue:
- Define SPI names as static final field for each analysis component so that users can get the component by name (via NAME static field.) This also provides compile time safety.
- Officially document the SPI names in Javadocs.
- Add proper source validation rules to ant validate-source-patterns target so that we can make sure that all analysis components have correct field definitions and documentation
- Lookup SPI names on the new NAME fields. Instead deriving those from class names.
(Just for quick reference) we now have:
- 19 Tokenizers (TokenizerFactory.availableTokenizers())
- 6 CharFilters (CharFilterFactory.availableCharFilters())
- 118 TokenFilters (TokenFilterFactory.availableTokenFilters())