[SPARK-5056] Implementing Clara k-medoids clustering algorithm for large datasets - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: MLlib
Labels:
- features

Description

There is a specific k-medoids clustering algorithm for large datasets. The algorithm is called Clara in R, and is fully described in chapter 3 of Finding Groups in Data: An Introduction to Cluster Analysis. by Kaufman, L and Rousseeuw, PJ (1990).
The algorithm considers sub-datasets of fixed size (sampsize) such that the time and storage requirements become linear in n rather than quadratic. Each sub-dataset is partitioned into k clusters using the same algorithm as in Partinioning around Medoids (PAM).

Attachments

Issue Links

relates to

SPARK-4510 Add k-medoids Partitioning Around Medoids (PAM) algorithm

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Tomislav Milinovic

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Jan/15 09:51

Updated:: 13/Jan/15 01:33

Resolved:: 13/Jan/15 01:33