[SPARK-8540] KMeans-based outlier detection - ASF JIRA

XML

Word

Printable

JSON

Proposal for K-Means-based outlier detection:

Cluster data using K-Means
Provide prediction/filtering functionality which returns outliers/anomalies
- This can take some threshold parameter which specifies either (a) how far off a point needs to be to be considered an outlier or (b) how many outliers should be returned.

Note this will require a bit of API design, which should probably be posted and discussed on this JIRA before implementation.

Estimated:

336h

Remaining:

336h

Logged:

Not Specified