Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31217

Unnecessary persist on cumulativeCounts in BinaryClassificationMetrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.4, 2.4.5
    • None
    • ML, MLlib

    Description

      In mllib.evaluation.BinaryClassificationMetrics, cumulativeCounts is cached in a lazy initialization. But when I run LogisticRegressionSummaryExample as well as ModelSelectionViaCrossValidationExample, I find that cached cumulativeCounts only used by one action during execution.
      So I think it should not be cached in initilization, we can set an extra persist() API in this class, just as that the unpersist() API in BinaryClassificationMetrics releases cached cumulativeCounts.

      Attachments

        Activity

          People

            Unassigned Unassigned
            spark_cachecheck IcySanwitch
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: