Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.3.0
-
None
Description
There are two initializationMode in Kmeans, random mode and parallel mode.
The ParallelMode is using kmeansPlusPlus to generate the centers point, but the kMeansPlusPlus is a local method which runs in the driver.
If the scale of points is huge, the kMeansPlusPlus will run for a long time, because it is a single thread method running in the driiver.
We can make this method run in parallel to make it faster, such as using Parallel collections.