Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
Description
The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function. Mini-batches are subsets of the input data, randomly sampled in each training iteration. These mini-batches drastically reduce the amount of computation required to converge to a local solution. In contrast to other algorithms that reduce the convergence time of k-means, mini-batch k-means produces results that are generally only slightly worse than the standard algorithm.
Comparison of the K-Means and MiniBatchKMeans on sklearn : http://scikit-learn.org/stable/auto_examples/cluster/plot_mini_batch_kmeans.html#example-cluster-plot-mini-batch-kmeans-py
Since MiniBatch-KMeans with fraction=1.0 is not equal to KMeans, so I make it a new estimator
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-2308 Add KMeans MiniBatch clustering algorithm to MLlib
- Resolved
- is duplicated by
-
SPARK-6000 Batch K-Means clusters should support "mini-batch" updates
- Closed
- links to
User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/11974