Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-548

KMeansPlusPlusClusterer should run multiple trials

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      The interface and documentation for KMeansPlusPlusClusterer imply that a single call to cluster() is sufficient to get the optimal set of clusters. But this isn't true – practically every client should be calling cluster() multiple times, selecting the best resulting set of clusters. It seems to me that rather than forcing every client to implement this functionality, it should be placed directly in the KMeansPlusPlusClusterer class.

      I propose adding a new method to KMeansPlusPlusClusterer:
      List<Cluster<T>> cluster(Collection<T> points, int k, int numTrials, int maxIterationsPerTrial)
      which calls the existing cluster() method numTrials times, returning the best result.

      Attachments

        Activity

          People

            Unassigned Unassigned
            npaymer Nate Paymer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: