Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23528

Add numIter to ClusteringSummary

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.1
    • 2.4.0
    • ML
    • None

    Description

      Spark ML should expose vital statistics of the GMM model:

      • Number of iterations (actual, not max) until the tolerance threshold was hit: we can set a maximum, but how do we know the limit was large enough, and how many iterations it really took?

      Follow up: Final log likelihood of the model: if we run multiple times with different starting conditions, how do we know which run converged to the better fit?

      Attachments

        Issue Links

          Activity

            People

              mgaido Marco Gaido
              erich.schubert Erich Schubert
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: