Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1162

Adding BallKMeans and StreamingKMeans classes

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8
    • 0.8
    • classic
    • None

    Description

      Adding BallKMeans and StreamingKMeans clustering algorithms.
      These both implement Iterable<Centroid> and thus return the resulting centroids after clustering.

      BallKMeans implements:

      • kmeans++ initialization;
      • a normal k-means pass;
      • a trimming threshold so that points that are too far from the cluster they were assigned to are not used in the new centroid computation.

      StreamingKMeans implements http://books.nips.cc/papers/files/nips24/NIPS2011_1271.pdf:

      • an online clustering algorithm that takes each point into account one by one
      • for each point, it computes the distance to the nearest existing cluster
      • if the distance is greater than a set distanceCutoff, it will create a new cluster, otherwise it might be added to the cluster it's closest to (proportional to the value of the distance / distanceCutoff)
      • if there are too many clusters, the clusters will be collapsed (the same method gets called, but the number of clusters is re-adjusted)
      • finally, about as many clusters as requested are returned (not precise!); this represents a sketch of the original points.

      Attachments

        1. MAHOUT_1162_with_test.patch
          56 kB
          Dan Filimon

        Activity

          People

            Unassigned Unassigned
            dfilimon Dan Filimon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: