[SOLR-769] Support Document and Search Result clustering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4
Component/s: contrib - Clustering
Labels:
None

Description

Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering.

The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results.

While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??????) for the clusters. I may push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

subcluster-flattening.patch
08/Jul/09 19:23
1 kB
Stanislaw Osinski
SOLR-769.patch
01/Jul/09 20:27
13 kB
Yonik Seeley
SOLR-769.patch
01/Jul/09 17:47
10 kB
Yonik Seeley
clustering-componet-shard.patch
03/Jun/09 02:50
21 kB
Brad Giaccio
SOLR-769-analyzerClass.patch
24/May/09 02:10
3 kB
Koji Sekiguchi
SOLR-769.patch
13/May/09 21:57
177 kB
Grant Ingersoll
SOLR-769.tar
20/Apr/09 01:56
1.87 MB
Grant Ingersoll
SOLR-769.patch
20/Apr/09 01:56
177 kB
Grant Ingersoll
SOLR-769.patch
22/Mar/09 16:05
122 kB
Grant Ingersoll
SOLR-769.zip
20/Mar/09 20:09
42 kB
Stanislaw Osinski
SOLR-769-lib.zip
18/Mar/09 12:51
1.68 MB
Stanislaw Osinski
SOLR-769.patch
10/Feb/09 14:43
164 kB
Grant Ingersoll
SOLR-769.patch
13/Jan/09 13:53
187 kB
Grant Ingersoll
SOLR-769.patch
21/Oct/08 13:28
193 kB
Grant Ingersoll
SOLR-769.patch
21/Oct/08 10:41
191 kB
Grant Ingersoll
SOLR-769.patch
20/Oct/08 18:55
183 kB
Grant Ingersoll
clustering-libs.tar
15/Oct/08 13:12
1.87 MB
Grant Ingersoll
SOLR-769.patch
15/Oct/08 13:11
193 kB
Grant Ingersoll
SOLR-769.patch
12/Oct/08 02:27
150 kB
Grant Ingersoll
SOLR-769.patch
11/Oct/08 20:05
104 kB
Grant Ingersoll
clustering-libs.tar
11/Oct/08 19:54
1.54 MB
Grant Ingersoll

Issue Links

is related to

SOLR-1314 Upgrade Carrot2 to version 3.1.0

Closed

Activity

People

Assignee:: Dawid Weiss

Reporter:: Grant Ingersoll

Votes:: 4 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 11/Sep/08 02:58

Updated:: 16/Jul/15 14:12

Resolved:: 16/Jul/15 14:12