[SPARK-6001] K-Means clusterer should return the assignments of input points to clusters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.2.1
Fix Version/s: 1.5.0
Component/s: MLlib
Labels:
None

Description

The K-Means clusterer returns a KMeansModel that contains the cluster centers. However, when available, I suggest that the K-Means clusterer also return an RDD of the assignments of the input data to the clusters. While the assignments can be computed given the KMeansModel, why not return assignments if they are available to save re-computation costs.

The K-means implementation at https://github.com/derrickburns/generalized-kmeans-clustering returns the assignments when available.

Attachments

Issue Links

Is contained by

SPARK-7879 KMeans API for spark.ml Pipelines

Resolved

is related to

SPARK-7674 R-like stats for ML models

Resolved

Activity

People

Assignee:: Yu Ishikawa

Reporter:: Derrick Burns

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 25/Feb/15 08:21

Updated:: 05/Nov/15 00:43

Resolved:: 05/Nov/15 00:43