Description
The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for non-English tokenizing. Presently it can be invoked by using the StandardTokenizerFactory and setting the Version to 3.1. However, it would be useful to be able to use the improved unicode processing without necessarily including the ip address and email address processing of StandardAnalyzer. A FilterFactory that allowed the use of the StandardTokenizer with UAX#29 support on its own would be useful.
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-2763 Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer
- Closed