[SPARK-4039] KMeans support sparse cluster centers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: MLlib
Labels:
- clustering

Description

When the number of features is not known, it might be quite helpful to create sparse vectors using HashingTF.transform. KMeans transforms centers vectors to dense vectors (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L307), therefore leading to OutOfMemory (even with small k).

Any way to keep vectors sparse ?

Attachments

Issue Links

is duplicated by

SPARK-12861 Changes to support KMeans with large feature space

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Antoine Amend

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 21/Oct/14 19:51

Updated:: 27/Feb/16 19:56

Resolved:: 03/Dec/15 15:28