Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
The LocalKMeans method should be replaced with a parallel implementation. As it stands now, it becomes a bottleneck for large data sets.
I have implemented this functionality in my version of the clusterer. However, I see that there are hundreds of outstanding pull requests. If someone on the team wants to sponsor the pull request, I will create one. Otherwise, I will just maintain my own private fork of the clusterer.
Attachments
Issue Links
- is duplicated by
-
SPARK-6706 kmeans|| hangs for a long time if both k and vector dimension are large
- Resolved