Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3220

K-Means clusterer should perform K-Means initialization in parallel

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • MLlib

    Description

      The LocalKMeans method should be replaced with a parallel implementation. As it stands now, it becomes a bottleneck for large data sets.

      I have implemented this functionality in my version of the clusterer. However, I see that there are hundreds of outstanding pull requests. If someone on the team wants to sponsor the pull request, I will create one. Otherwise, I will just maintain my own private fork of the clusterer.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              derrickburns Derrick Burns
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: