Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-818

Canopy Emits Too Many Trivial Clusters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5
    • 0.6
    • classic
    • None

    Description

      Users of Canopy clustering report that the single reducer used in the mapreduce version often takes dispropportately long to process the results of multiple mappers. This patch introduces a new Canopy CLI argument, cf (-clusterFilter), which if present establishes a lower bound on the numPoints of canopies output from the algorithm. The default value for this filter is 0, and all canopies are output. Setting -cf 1 would eliminate any canopies which contain only 1 point from subsequent processing steps.

      Attachments

        1. vector.tar.gz
          1.89 MB
          beneo
        2. MAHOUT-818.patch
          30 kB
          Jeff Eastman

        Activity

          People

            jeastman Jeff Eastman
            jeastman Jeff Eastman
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: