Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.2.1, 1.3.0
-
None
-
Windows 64bit, Linux 64bit
Description
When doing k-means cluster with the "kmeans||" algorithm which is the default one. The algorithm finished some collect() jobs, then the driver hangs for a long time.
Settings:
- k above 100
- feature dimension about 360
- total data size is about 100 MB
The issue was first noticed with Spark 1.2.1. I tested with both local and cluster mode. On Spark 1.3.0. I, I can also reproduce this issue with local mode. *However, I do not have a 1.3.0 cluster environment for me to test.*
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-3220 K-Means clusterer should perform K-Means initialization in parallel
- Resolved
- links to