Description
Spinoff from LUCENE-7314:
StandardAnalyzer has progressed substantially since we broke out the analyzers module ... it now follows a real Unicode standard (UAX #29 Unicode Text Segmentation). It's also much faster than it used to be, since it switched to JFlex a while back. Many bug fixes, etc.
I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for IndexWriter.
It's really quite crazy that users must go digging in the analyzers module to get started with Lucene ... we don't make them dig through the codecs module to find a good default codec ...
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-7444 Remove English stopwords default from StandardAnalyzer in Lucene-Core
- Closed