Solr
  1. Solr
  2. SOLR-769

Support Document and Search Result clustering

    Details

    • Type: New Feature New Feature
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: contrib - Clustering
    • Labels:
      None

      Description

      Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering.

      The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results.

      While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??????) for the clusters. I may push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism.

      1. subcluster-flattening.patch
        1 kB
        Stanislaw Osinski
      2. SOLR-769-lib.zip
        1.68 MB
        Stanislaw Osinski
      3. SOLR-769-analyzerClass.patch
        3 kB
        Koji Sekiguchi
      4. SOLR-769.zip
        42 kB
        Stanislaw Osinski
      5. SOLR-769.tar
        1.87 MB
        Grant Ingersoll
      6. SOLR-769.patch
        104 kB
        Grant Ingersoll
      7. SOLR-769.patch
        150 kB
        Grant Ingersoll
      8. SOLR-769.patch
        193 kB
        Grant Ingersoll
      9. SOLR-769.patch
        183 kB
        Grant Ingersoll
      10. SOLR-769.patch
        191 kB
        Grant Ingersoll
      11. SOLR-769.patch
        193 kB
        Grant Ingersoll
      12. SOLR-769.patch
        187 kB
        Grant Ingersoll
      13. SOLR-769.patch
        164 kB
        Grant Ingersoll
      14. SOLR-769.patch
        122 kB
        Grant Ingersoll
      15. SOLR-769.patch
        177 kB
        Grant Ingersoll
      16. SOLR-769.patch
        177 kB
        Grant Ingersoll
      17. SOLR-769.patch
        10 kB
        Yonik Seeley
      18. SOLR-769.patch
        13 kB
        Yonik Seeley
      19. clustering-libs.tar
        1.54 MB
        Grant Ingersoll
      20. clustering-libs.tar
        1.87 MB
        Grant Ingersoll
      21. clustering-componet-shard.patch
        21 kB
        Brad Giaccio

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Dawid Weiss
              Reporter:
              Grant Ingersoll
            • Votes:
              4 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development