Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1162

Adding BallKMeans and StreamingKMeans classes

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8
    • Fix Version/s: 0.8
    • Component/s: Clustering
    • Labels:
      None

      Description

      Adding BallKMeans and StreamingKMeans clustering algorithms.
      These both implement Iterable<Centroid> and thus return the resulting centroids after clustering.

      BallKMeans implements:

      • kmeans++ initialization;
      • a normal k-means pass;
      • a trimming threshold so that points that are too far from the cluster they were assigned to are not used in the new centroid computation.

      StreamingKMeans implements http://books.nips.cc/papers/files/nips24/NIPS2011_1271.pdf:

      • an online clustering algorithm that takes each point into account one by one
      • for each point, it computes the distance to the nearest existing cluster
      • if the distance is greater than a set distanceCutoff, it will create a new cluster, otherwise it might be added to the cluster it's closest to (proportional to the value of the distance / distanceCutoff)
      • if there are too many clusters, the clusters will be collapsed (the same method gets called, but the number of clusters is re-adjusted)
      • finally, about as many clusters as requested are returned (not precise!); this represents a sketch of the original points.

        Attachments

        1. MAHOUT_1162_with_test.patch
          56 kB
          Dan Filimon

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dfilimon Dan Filimon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: