Description
Spark ML should expose vital statistics of the GMM model:
- Number of iterations (actual, not max) until the tolerance threshold was hit: we can set a maximum, but how do we know the limit was large enough, and how many iterations it really took?
Follow up: Final log likelihood of the model: if we run multiple times with different starting conditions, how do we know which run converged to the better fit?
Attachments
Issue Links
- contains
-
SPARK-24973 Add numIter to Python ClusteringSummary
- Resolved
- links to