Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
kmeans_random( rel_source, expr_point, k, -- can be a single value like now or an array of k values fn_dist, -- optional agg_centroid, -- optional max_num_iterations, -- optional min_frac_reassigned, -- optional k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values) )
kmeanspp( rel_source, expr_point, k, -- can be a single value like now or an array of k values fn_dist, -- optional agg_centroid, -- optional max_num_iterations, -- optional min_frac_reassigned, -- optional seeding_sample_ratio, -- optional k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values) )
k
INTEGER of INTEGER[]. The number of centroids to calculate. Can be a single value
or an array of k values to explore. If array of k values given, the parameter 'k_selection_algorithm'
determines the evaluation method.
k_selection_algorithm (optional) TEXT, default: 'elbow'. Method to evaluate number of centroids k. Only applies if the parameter 'k' is an array with multiple k values. Currently two approaches are supported: 'elbow', and 'silhouette'. The text can be any subset of the strings; for e.g., 'silh' will use the silhouette method.
e.g.,
SELECT * FROM madlib.kmeanspp ( 'km_sample', -- rel_source 'points', -- expr_point 'ARRAY[2, 4, 6, 8, 10]', -- k 'madlib.squared_dist_norm2', -- fn_dist 'madlib.avg', -- agg_centroid 20, -- max_num_iterations 0.001, -- min_frac_reassigned 'elbow' -- k_selection_algorithm );