Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4259

Add Power Iteration Clustering Algorithm with Gaussian Similarity Function

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • MLlib

    Description

      In recent years, power Iteration clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm.

      Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm:

      computes the Gaussian distance between all pairs of points and represents these distances in an Affinity Matrix
      calculates a Normalized Affinity Matrix
      calculates the principal eigenvalue and eigenvector
      Clusters each of the input points according to their principal eigenvector component value

      Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]

      {www.icml2010.org/papers/387.pdf}

      Attachments

        Issue Links

          Activity

            People

              fjiang6 Fan Jiang
              fjiang6 Fan Jiang
              Xiangrui Meng Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: