Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31217

Unnecessary persist on cumulativeCounts in BinaryClassificationMetrics

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.4, 2.4.5
    • Fix Version/s: None
    • Component/s: ML, MLlib
    • Labels:
      None

      Description

      In mllib.evaluation.BinaryClassificationMetrics, cumulativeCounts is cached in a lazy initialization. But when I run LogisticRegressionSummaryExample as well as ModelSelectionViaCrossValidationExample, I find that cached cumulativeCounts only used by one action during execution.
      So I think it should not be cached in initilization, we can set an extra persist() API in this class, just as that the unpersist() API in BinaryClassificationMetrics releases cached cumulativeCounts.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              spark_cachecheck IcySanwitch
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: