[SPARK-23528] Add numIter to ClusteringSummary - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.1
Fix Version/s: 2.4.0
Component/s: ML
Labels:
None

Description

Spark ML should expose vital statistics of the GMM model:

Number of iterations (actual, not max) until the tolerance threshold was hit: we can set a maximum, but how do we know the limit was large enough, and how many iterations it really took?

Follow up: Final log likelihood of the model: if we run multiple times with different starting conditions, how do we know which run converged to the better fit?

Attachments

Issue Links

contains

SPARK-24973 Add numIter to Python ClusteringSummary

Resolved

links to

[Github] Pull Request #20701 (mgaido91)

Activity

People

Assignee:: Marco Gaido

Reporter:: Erich Schubert

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Feb/18 21:59

Updated:: 31/Jul/18 18:25

Resolved:: 13/Jul/18 18:26