Description
The K-Means clusterer returns a KMeansModel that contains the cluster centers. However, when available, I suggest that the K-Means clusterer also return an RDD of the assignments of the input data to the clusters. While the assignments can be computed given the KMeansModel, why not return assignments if they are available to save re-computation costs.
The K-means implementation at https://github.com/derrickburns/generalized-kmeans-clustering returns the assignments when available.
Attachments
Issue Links
- Is contained by
-
SPARK-7879 KMeans API for spark.ml Pipelines
- Resolved
- is related to
-
SPARK-7674 R-like stats for ML models
- Resolved