Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.6.3
Description
Fix problems in KMeans example and revise test case.
1) Typo [1] and input path issue
2) Wrong summationCount in assignCentersInternal
summationCount should also be incremented if [2]
if (clusterCenter == null) { newCenterArray[lowestDistantCenter] = key; }
Otherwise summationCount may stay zero when only one value is assigned. Then this zero will be propagated to incrementSum [3] and might cause a divide by zero in [4].
By the way if we add three vectors and the summationCount would only be two, this will lead to wrong results. Because later we are dividing the vector by the amount of increments.
3) Results depend on the amount numBspTask
(results vary if numBspTask is changed)
[1]
https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
[3]
https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172