[SPARK-3261] KMeans clusterer can return duplicate cluster centers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.0.2
Fix Version/s: 2.1.0
Component/s: MLlib
Labels:
- clustering

Description

This is a bad design choice. I think that it is preferable to produce no duplicate cluster centers. So instead of forcing the number of clusters to be K, return at most K clusters.

Attachments

Issue Links

is related to

SPARK-17389 KMeans speedup with better choice of k-means|| init steps = 2

Resolved

SPARK-18427 Update docs of mllib.KMeans

Resolved

relates to

SPARK-19319 SparkR Kmeans summary returns error when the cluster size doesn't equal to k

Resolved

links to

[Github] Pull Request #2419 (derrickburns)

[Github] Pull Request #2634 (derrickburns)

[Github] Pull Request #15450 (srowen)

(1 links to)

Activity

People

Assignee:: Sean R. Owen

Reporter:: Derrick Burns

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 27/Aug/14 19:05

Updated:: 21/Jan/17 18:52

Resolved:: 30/Oct/16 09:36