Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-769

Support Document and Search Result clustering

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: contrib - Clustering
    • Labels:
      None

      Description

      Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering.

      The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results.

      While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??????) for the clusters. I may push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism.

        Attachments

        1. clustering-libs.tar
          1.54 MB
          Grant Ingersoll
        2. SOLR-769.patch
          104 kB
          Grant Ingersoll
        3. SOLR-769.patch
          150 kB
          Grant Ingersoll
        4. SOLR-769.patch
          193 kB
          Grant Ingersoll
        5. clustering-libs.tar
          1.87 MB
          Grant Ingersoll
        6. SOLR-769.patch
          183 kB
          Grant Ingersoll
        7. SOLR-769.patch
          191 kB
          Grant Ingersoll
        8. SOLR-769.patch
          193 kB
          Grant Ingersoll
        9. SOLR-769.patch
          187 kB
          Grant Ingersoll
        10. SOLR-769.patch
          164 kB
          Grant Ingersoll
        11. SOLR-769-lib.zip
          1.68 MB
          Stanislaw Osinski
        12. SOLR-769.zip
          42 kB
          Stanislaw Osinski
        13. SOLR-769.patch
          122 kB
          Grant Ingersoll
        14. SOLR-769.patch
          177 kB
          Grant Ingersoll
        15. SOLR-769.tar
          1.87 MB
          Grant Ingersoll
        16. SOLR-769.patch
          177 kB
          Grant Ingersoll
        17. SOLR-769-analyzerClass.patch
          3 kB
          Koji Sekiguchi
        18. clustering-componet-shard.patch
          21 kB
          Brad Giaccio
        19. SOLR-769.patch
          10 kB
          Yonik Seeley
        20. SOLR-769.patch
          13 kB
          Yonik Seeley
        21. subcluster-flattening.patch
          1 kB
          Stanislaw Osinski

          Issue Links

            Activity

              People

              • Assignee:
                dweiss Dawid Weiss
                Reporter:
                gsingers Grant Ingersoll
              • Votes:
                4 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: