Mahout
  1. Mahout
  2. MAHOUT-929

Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: Classification, Clustering
    • Labels:
      None

      Description

      The current clustering drivers have a -cp option to produce clusteredPoints directory containing the input vectors classified by the final clusters produced by the algorithm. These options are redundantly implemented in those drivers.

      • Factor out & implement an independent post processor to perform the classification step independently of the various clustering implementations.
      • Implement a pluggable outlier removal capability for this classifier.
      • Consider building off of the ClusterClassifier & ClusterIterator ideas.
      1. Mahout-929
        13 kB
        Paritosh Ranjan
      2. Mahout-929
        30 kB
        Paritosh Ranjan
      3. Mahout-929
        28 kB
        Paritosh Ranjan
      4. Mahout-929
        11 kB
        Paritosh Ranjan

        Issue Links

          Activity

            People

            • Assignee:
              Paritosh Ranjan
              Reporter:
              Jeff Eastman
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development