-
Type:
Documentation
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 0.9.1
-
Fix Version/s: 0.9.3
-
Component/s: Documentation, Examples, MLlib, PySpark
-
Labels:
-
Environment:
All
The documentation example for MLlib Clustering contains Kmeans example.
http://spark.apache.org/docs/0.9.1/mllib-guide.html#clustering-2
Here this line mentioned below is wrong and misleading.
clusters = KMeans.train(parsedData, 2, maxIterations=10,runs=30, initialization_mode="random")
Look at the key parameter "initialization_mode" given in example. Its wrong as per the implementation of KMeans. It should be "initializationMode"
Correction:
clusters = KMeans.train(parsedData, 2, maxIterations=10,runs=30, initializationMode="random")