[LUCENE-7444] Remove English stopwords default from StandardAnalyzer in Lucene-Core - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 6.2
Fix Version/s: 8.0
Component/s: core/other, modules/analysis
Labels:
None

Lucene Fields:

New

Description

Yonik said on ~~LUCENE-7318~~:

I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for IndexWriter.

This "StandardAnalyzer" is specific to English, as it removes English stopwords.
That seems to be an odd choice now for a few reasons:

It was argued in the past (rather vehemently) that Solr should not prefer english in it's default "text" field

AFAIK, removing stopwords is no longer considered best practice.

Given that removal of english stopwords is the only thing that really makes this analyzer english-centric (and given the negative impact that can have on other languages), it seems like the stopword filter should be removed from StandardAnalyzer.

When trying to fix the backwards incompatibility issues in ~~LUCENE-7318~~, it looks like most unrelated code moved from analysis module to core (and changing package names!!!! ) was related to word list loading, CharArraySets, and superclasses of StopFilter. If we follow Yonik's suggestion, we can revert all those changes. I agree with hin, an "universal" analyzer should not have any language specific stop-words.

The other thing is LowercaseFilter, but I'd suggest to simply add a clone of it to Lucene core and leave the analysis-module self-contained.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7444.patch
13/Jun/18 10:35
2 kB
Alan Woodward

Issue Links

relates to

LUCENE-7318 Graduate StandardAnalyzer out of analyzers module into core

Closed

Activity

People

Assignee:: Alan Woodward

Reporter:: Uwe Schindler

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Sep/16 09:31

Updated:: 28/Aug/22 15:03

Resolved:: 13/Jun/18 11:08