Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-17346

Synchronise default configset stopwords to the same list as lucene

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Solr's default configset comes with a collection of sample stopwords from the snowball project in solr/server/solr/configsets/_default/conf/lang (https://github.com/apache/solr/tree/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang)

      There is a similar list of stopwords in the lucene repository, however these have been updated to a more recent list of snowball (https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball)

      Specifically, the most recent list of stopwords for the french language has removed a number of words which are homonyms of other useful words which shouldn't be skipped.

      In a discussion on the solr-users mailing list it was agreed that it would be a good idea to sync the list of files in solr with the ones in lucene.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alastairp Alastair Porter
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m