Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
1.2.1
-
None
Description
The test "k-means|| initialization in KMeansSuite can fail when the random number generator is truly random.
The test is predicated on the assumption that each round of K-Means || will add at least one new cluster center. The current implementation of K-Means || adds 2*k cluster centers with high probability. However, there is no deterministic lower bound on the number of cluster centers added.
Choices are:
1) change the KMeans || implementation to iterate on selecting points until it has satisfied a lower bound on the number of points chosen.
2) eliminate the test
3) ignore the problem and depend on the random number generator to sample the space in a lucky manner.
Option (1) is most in keeping with the contract that KMeans || should provide a precise number of cluster centers when possible.