Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.6.1
-
None
-
None
Description
As the major difference of KMeans++ to classic Kmeans, an initial distance square sampling process is executed, the function is named "chooseInitialCenters" in the current implementation. The time complexity is O(dkn), where d is the dimension, k is the cluster number and n is the number of points.
In paper "Ackermann, Lammersen, Märtens, Raupach, Sohler, Swierkot. StreamKM++: A Clustering Algorithm for Data Streams. In Proceedings of the 12th Workshop on Algorithm Engineering and Experiments (ALENEX 2010)", the authors introduced a data structure named "coreset tree" which can reduce the complexity of the initial sampling to O(dn log k). This is useful in the scenario when value of k is large, say 1000, then the run speed of the algorithm will be much faster.
Attachments
Issue Links
- is related to
-
MATH-1515 Enhance clustering API
- Open