Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14174

Implement the Mini-Batch KMeans

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • ML
    • None

    Description

      The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function. Mini-batches are subsets of the input data, randomly sampled in each training iteration. These mini-batches drastically reduce the amount of computation required to converge to a local solution. In contrast to other algorithms that reduce the convergence time of k-means, mini-batch k-means produces results that are generally only slightly worse than the standard algorithm.

      Comparison of the K-Means and MiniBatchKMeans on sklearn : http://scikit-learn.org/stable/auto_examples/cluster/plot_mini_batch_kmeans.html#example-cluster-plot-mini-batch-kmeans-py

      Since MiniBatch-KMeans with fraction=1.0 is not equal to KMeans, so I make it a new estimator

      Attachments

        1. MBKM.xlsx
          33 kB
          Ruifeng Zheng

        Issue Links

          Activity

            People

              Unassigned Unassigned
              podongfeng Ruifeng Zheng
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: