Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-929

Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.6
    • 0.7
    • classic
    • None

    Description

      The current clustering drivers have a -cp option to produce clusteredPoints directory containing the input vectors classified by the final clusters produced by the algorithm. These options are redundantly implemented in those drivers.

      • Factor out & implement an independent post processor to perform the classification step independently of the various clustering implementations.
      • Implement a pluggable outlier removal capability for this classifier.
      • Consider building off of the ClusterClassifier & ClusterIterator ideas.

      Attachments

        1. Mahout-929
          11 kB
          Paritosh Ranjan
        2. Mahout-929
          28 kB
          Paritosh Ranjan
        3. Mahout-929
          30 kB
          Paritosh Ranjan
        4. Mahout-929
          13 kB
          Paritosh Ranjan

        Issue Links

          Activity

            People

              paritoshranjan Paritosh Ranjan
              jeastman Jeff Eastman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: