Description
Create a K-Means API for the spark.ml Pipelines API. This should wrap the existing KMeans implementation in spark.mllib.
This should be the first clustering method added to Pipelines, and it will be important to consider SPARK-7610 and think about designing the clustering API. We do not have to have abstractions from the beginning (and probably should not) but should think far enough ahead so we can add abstractions later on.
Attachments
Issue Links
- contains
-
SPARK-6001 K-Means clusterer should return the assignments of input points to clusters
- Resolved
- is duplicated by
-
SPARK-7881 KMeans API for spark.ml Pipelines
- Closed
- is related to
-
SPARK-7610 Design clustering abstractions for Pipelines API
- Resolved
- relates to
-
SPARK-12215 User guide section for KMeans in spark.ml
- Resolved
-
SPARK-9149 Add an example of spark.ml KMeans
- Resolved
- links to