Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I notice that the BallKMeans.iterativeAssignment method uses the following code to calculate weights:
BallKMeans.java
for (WeightedVector datapoint : datapoints) { Centroid closestCentroid = (Centroid) centroids.searchFirst(datapoint, false).getValue(); closestCentroid.setWeight(closestCentroid.getWeight() + datapoint.getWeight()); }
In MAHOUT-1237, the buggy code is the same way to calculate the weight:
ClusteringUtils.java
for (Vector vector : datapoints) { Centroid closest = (Centroid) centroids.searchFirst(vector, false).getValue(); totalCost += closest.getWeight(); }
The fixed code is as follow:
ClusteringUtils.java
for (Vector vector : datapoints) { totalCost += centroids.searchFirst(vector, false).getWeight(); }
I am not quite sure whether BallKMeans.iterativeAssignment sets the right weights. Please check it.