Description
Improved the speed of KMeans by passing only cluster ID from mapper to reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once approach and the code is modified accordingly so that it won't create a bug when combiner runs zero or more than once.
Attachments
Attachments
Issue Links
- relates to
-
MAHOUT-79 Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks
- Closed