Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
The interface and documentation for KMeansPlusPlusClusterer imply that a single call to cluster() is sufficient to get the optimal set of clusters. But this isn't true – practically every client should be calling cluster() multiple times, selecting the best resulting set of clusters. It seems to me that rather than forcing every client to implement this functionality, it should be placed directly in the KMeansPlusPlusClusterer class.
I propose adding a new method to KMeansPlusPlusClusterer:
List<Cluster<T>> cluster(Collection<T> points, int k, int numTrials, int maxIterationsPerTrial)
which calls the existing cluster() method numTrials times, returning the best result.