Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2308

Add KMeans MiniBatch clustering algorithm to MLlib

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • MLlib

    Description

      Mini-batch is a version of KMeans that uses a randomly-sampled subset of the data points in each iteration instead of the full set of data points, improving performance (and in some cases, accuracy). The mini-batch version is compatible with the KMeans|| initialization algorithm currently implemented in MLlib.

      I suggest adding KMeans Mini-batch as an alternative.

      I'd like this to be assigned to me.

      Attachments

        1. many_small_centers.pdf
          20 kB
          R J Nowling
        2. uneven_centers.pdf
          15 kB
          R J Nowling

        Issue Links

          Activity

            People

              rnowling R J Nowling
              rnowling R J Nowling
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: