Details
-
Improvement
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Solr's default configset comes with a collection of sample stopwords from the snowball project in solr/server/solr/configsets/_default/conf/lang (https://github.com/apache/solr/tree/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang)
There is a similar list of stopwords in the lucene repository, however these have been updated to a more recent list of snowball (https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball)
Specifically, the most recent list of stopwords for the french language has removed a number of words which are homonyms of other useful words which shouldn't be skipped.
In a discussion on the solr-users mailing list it was agreed that it would be a good idea to sync the list of files in solr with the ones in lucene.
Attachments
Issue Links
- links to