Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-929

Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: Classification, Clustering
    • Labels:
      None

      Description

      The current clustering drivers have a -cp option to produce clusteredPoints directory containing the input vectors classified by the final clusters produced by the algorithm. These options are redundantly implemented in those drivers.

      • Factor out & implement an independent post processor to perform the classification step independently of the various clustering implementations.
      • Implement a pluggable outlier removal capability for this classifier.
      • Consider building off of the ClusterClassifier & ClusterIterator ideas.

        Attachments

        1. Mahout-929
          13 kB
          Paritosh Ranjan
        2. Mahout-929
          30 kB
          Paritosh Ranjan
        3. Mahout-929
          28 kB
          Paritosh Ranjan
        4. Mahout-929
          11 kB
          Paritosh Ranjan

          Issue Links

            Activity

              People

              • Assignee:
                paritoshranjan Paritosh Ranjan
                Reporter:
                jeastman Jeff Eastman
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: