Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2694

machine learning

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.0.0
    • 1.0.0
    • MLlib
    • Linux

    Description

      Machine Learning Algorithm

      Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm proceeds by alternating between two steps:

      Assignment step: Assign each observation to the cluster whose mean yields the least within-cluster sum of squares . Since the sum of squares is the squared Euclidean distance, this is intuitively the "nearest" mean. (Mathematically, this means partitioning the observations according to the Voronoi diagram generated by the means).

      Update step: Calculate the new means to be the centroids of the observations in the new clusters.

      Since the arithmetic mean is a least-squares estimator, this also minimizes the within-cluster sum of squares objective.

      The algorithm has converged when the assignments no longer change. Since both steps optimize the within-cluster sum of squares objective, and there only exists a finite number of such partitionings, the algorithm must converge to a (local) optimum.

      The algorithm is used for assigning objects to the nearest cluster by distance. The standard algorithm aims at minimizing the WCSS objective, and thus assigns by "least sum of squares", which is exactly equivalent to assigning by the smallest Euclidean distance. Using a different distance function other than (squared) Euclidean distance may stop the algorithm from converging.[citation needed] Various modifications of k-means such as spherical k-means and k-medoids have been proposed to allow using other distance measures.

      Attachments

        Activity

          People

            Unassigned Unassigned
            akashoshin Akash
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: