Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2429

Hierarchical Implementation of KMeans

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • MLlib

    Description

      Hierarchical clustering algorithms are widely used and would make a nice addition to MLlib. Clustering algorithms are useful for determining relationships between clusters as well as offering faster assignment. Discussion on the dev list suggested the following possible approaches:

      • Top down, recursive application of KMeans
      • Reuse DecisionTree implementation with different objective function
      • Hierarchical SVD

      It was also suggested that support for distance metrics other than Euclidean such as negative dot or cosine are necessary.

      Attachments

        1. The Result of Benchmarking a Hierarchical Clustering.pdf
          455 kB
          Yu Ishikawa
        2. benchmark2.html
          477 kB
          Yu Ishikawa
        3. 2014-10-20_divisive-hierarchical-clustering.pdf
          244 kB
          Yu Ishikawa
        4. benchmark-result.2014-10-29.html
          525 kB
          Yu Ishikawa

        Issue Links

          Activity

            People

              yuu.ishikawa@gmail.com Yu Ishikawa
              rnowling R J Nowling
              Xiangrui Meng Xiangrui Meng
              Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: