Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6137

G-Means clustering algorithm implementation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • MLlib

    Description

      Will it be useful to implement G-Means clustering algorithm based on K-Means?
      G-means is a powerful extension of k-means, which uses test of cluster data normality to decide if it necessary to split current cluster into new two. It's relative complexity (compared to k-Means) is O(K), where K is maximum number of clusters.

      The original paper is by Greg Hamerly and Charles Elkan from University of California:
      http://papers.nips.cc/paper/2526-learning-the-k-in-k-means.pdf

      I also have a small prototype of this algorithm written in R (if anyone is interested in it).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            denmoroz Denis Dus
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment