Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4259

Add Power Iteration Clustering Algorithm with Gaussian Similarity Function

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: MLlib
    • Labels:
    • Target Version/s:

      Description

      In recent years, power Iteration clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm.

      Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm:

      computes the Gaussian distance between all pairs of points and represents these distances in an Affinity Matrix
      calculates a Normalized Affinity Matrix
      calculates the principal eigenvalue and eigenvector
      Clusters each of the input points according to their principal eigenvector component value

      Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]

      {www.icml2010.org/papers/387.pdf}

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fjiang6 Fan Jiang
                Reporter:
                fjiang6 Fan Jiang
                Shepherd:
                Xiangrui Meng
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: