[SPARK-2694] machine learning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.0.0
Fix Version/s: 1.0.0
Component/s: MLlib
Labels:
- Algorithm
Environment:

Linux

Target Version/s:

1.0.0

Description

Machine Learning Algorithm

Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm proceeds by alternating between two steps:

Assignment step: Assign each observation to the cluster whose mean yields the least within-cluster sum of squares . Since the sum of squares is the squared Euclidean distance, this is intuitively the "nearest" mean. (Mathematically, this means partitioning the observations according to the Voronoi diagram generated by the means).

Update step: Calculate the new means to be the centroids of the observations in the new clusters.

Since the arithmetic mean is a least-squares estimator, this also minimizes the within-cluster sum of squares objective.

The algorithm has converged when the assignments no longer change. Since both steps optimize the within-cluster sum of squares objective, and there only exists a finite number of such partitionings, the algorithm must converge to a (local) optimum.

The algorithm is used for assigning objects to the nearest cluster by distance. The standard algorithm aims at minimizing the WCSS objective, and thus assigns by "least sum of squares", which is exactly equivalent to assigning by the smallest Euclidean distance. Using a different distance function other than (squared) Euclidean distance may stop the algorithm from converging.[citation needed] Various modifications of k-means such as spherical k-means and k-medoids have been proposed to allow using other distance measures.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Akash

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Jul/14 19:29

Updated:: 26/Jul/14 20:11

Resolved:: 26/Jul/14 20:11