Description
There is a specific k-medoids clustering algorithm for large datasets. The algorithm is called Clara in R, and is fully described in chapter 3 of Finding Groups in Data: An Introduction to Cluster Analysis. by Kaufman, L and Rousseeuw, PJ (1990).
The algorithm considers sub-datasets of fixed size (sampsize) such that the time and storage requirements become linear in n rather than quadratic. Each sub-dataset is partitioned into k clusters using the same algorithm as in Partinioning around Medoids (PAM).
Attachments
Issue Links
- relates to
-
SPARK-4510 Add k-medoids Partitioning Around Medoids (PAM) algorithm
- Resolved