[SPARK-3588] Gaussian Mixture Model clustering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: MLlib, PySpark
Labels:
None

Description

Gaussian Mixture Models (GMM) is a popular technique for soft clustering. GMM models the entire data set as a finite mixture of Gaussian distributions,each parameterized by a mean vector µ ,a covariance matrix ∑ and a mixture weight π. In this technique, probability of each point to belong to each cluster is computed along with the cluster statistics.

We have come up with an initial distributed implementation of GMM in pyspark where the parameters are estimated using the Expectation-Maximization algorithm.Our current implementation considers diagonal covariance matrix for each component.

We did an initial benchmark study on a 2 node Spark standalone cluster setup where each node config is 8 Cores,8 GB RAM, the spark version used is 1.0.0. We also evaluated python version of k-means available in spark on the same datasets.
Below are the results from this benchmark study. The reported stats are average from 10 runs.Tests were done on multiple datasets with varying number of features and instances.

Dataset	Gaussian mixture model	Kmeans(Python)

Instances

Dimensions

Avg time per iteration

Time for 100 iterations

Avg time per iteration

Time for 100 iterations

0.7million

12min

13s

26min

1.8million

17s

29min

33s

53min

10million

1.6min

2.7hr

1.2min

2hr

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

GMMSpark.py
18/Sep/14 10:49
5 kB
Meethu Mathew

Issue Links

is duplicated by

SPARK-951 Gaussian Mixture Model

Resolved

SPARK-4156 Add expectation maximization for Gaussian mixture models to MLLib clustering

Resolved

SPARK-952 Python version of Gaussian Mixture Model

Resolved

is related to

SPARK-4156 Add expectation maximization for Gaussian mixture models to MLLib clustering

Resolved

Activity

People

Assignee:: Meethu Mathew

Reporter:: Meethu Mathew

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 18/Sep/14 10:47

Updated:: 30/Dec/14 18:58

Resolved:: 30/Dec/14 18:58