Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43297

Make improvement to LocalKMeans

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.3.0
    • None
    • MLlib

    Description

      There are two initializationMode in Kmeans, random mode and parallel mode.

      The ParallelMode is using kmeansPlusPlus to generate the centers point, but the kMeansPlusPlus is a local method which runs in the driver.

      If the scale of points is huge, the kMeansPlusPlus will run for a long time, because it is a single thread method running in the driiver.

      We can make this method run in parallel to make it faster, such as using Parallel collections. 

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            leo wen wenweijian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: