Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2450

Carrot2 clustering should use both its own and Solr's stop words

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 3.2, 4.0-ALPHA
    • contrib - Clustering
    • None

    Description

      While using only Solr's stop words for clustering isn't a good idea (compared to indexing, clustering needs more aggressive stop word removal to get reasonable cluster labels), it would be good if Carrot2 used both its own and Solr's stop words.

      I'm not sure what the best way to implement this would be though. My first thought was to simply load stopwords.txt from Solr config dir and merge them with Carrot2's. But then, maybe a better approach would be to get the stop words from the StopFilter being used? Ideally, we should also consider the per-field stop filters configured on the fields used for clustering.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stanislaw.osinski Stanislaw Osinski
            stanislaw.osinski Stanislaw Osinski
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment