Description
The current clustering drivers have a -cp option to produce clusteredPoints directory containing the input vectors classified by the final clusters produced by the algorithm. These options are redundantly implemented in those drivers.
- Factor out & implement an independent post processor to perform the classification step independently of the various clustering implementations.
- Implement a pluggable outlier removal capability for this classifier.
- Consider building off of the ClusterClassifier & ClusterIterator ideas.
Attachments
Attachments
Issue Links
- incorporates
-
MAHOUT-930 Refactor Vector Classifaction out of Clustering - Make Classification abstract
- Closed
-
MAHOUT-931 Implement a pluggable outlier removal capability for cluster classifiers
- Closed
-
MAHOUT-933 Implement mapreduce version of ClusterIterator
- Closed