r1298625 made the following changes:
MAHOUT-933:
- refactored ClusteringPolicies into hierarchy under new AbstractClusteringPolicy
- added close() to ClusteringPolicy to allow policy-specific actions needed to compute convergence
- removed ClusteringPolicy from ClusterIterator constructor as ClusterClassifier already has one
- added convergence computations for kmeans and fuzzyk
- added final clustersOut renaming to add -final suffix
- updated Display examples and unit tests to reflect above
- all tests run
I think it is time to begin refactoring the buildClusters methods of the respective clustering drivers to use ClusterIterator as it seems to be producing equivalent results to the original implementations. This will involve removing a lot of existing driver, mapper and reducer code and many time-consuming unit tests. It will also have some impact on other components as the representation of clusters in the file system changes from Cluster to self-describing ClusterWritable.
I have created independent subtasks to address these conversion issues so that they may be undertaken independently.
Integrated in Mahout-Quality #1272 (See https://builds.apache.org/job/Mahout-Quality/1272/)
MAHOUT-846: Improved scalability of GaussianCluster.pdf. Introduced some beginnings forMAHOUT-933. All tests run.jeastman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1224730
Files :