[SOLR-2450] Carrot2 clustering should use both its own and Solr's stop words - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.2, 4.0-ALPHA
Component/s: contrib - Clustering
Labels:
None

Description

While using only Solr's stop words for clustering isn't a good idea (compared to indexing, clustering needs more aggressive stop word removal to get reasonable cluster labels), it would be good if Carrot2 used both its own and Solr's stop words.

I'm not sure what the best way to implement this would be though. My first thought was to simply load stopwords.txt from Solr config dir and merge them with Carrot2's. But then, maybe a better approach would be to get the stop words from the StopFilter being used? Ideally, we should also consider the per-field stop filters configured on the fields used for clustering.

Attachments

SOLR-2450.patch
02/Apr/11 18:11
14 kB
Stanislaw Osinski

Issue Links

Add Link

depends upon

SOLR-2448 Upgrade Carrot2 to version 3.5.0

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Stanislaw Osinski

Reporter:: Stanislaw Osinski

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 30/Mar/11 17:32

Updated:: 02/May/13 02:29

Resolved:: 16/May/11 16:30

Agile

View on Board

Carrot2 clustering should use both its own and Solr's stop words

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment