-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Duplicate
-
Affects Version/s: 2.4.3
-
Fix Version/s: None
-
Component/s: MLlib
-
Labels:None
The rdd scoreAndLabels.combineByKey is used by two actions: sortByKey and count(), so it needs to be persisted.
val counts = scoreAndLabels.combineByKey( createCombiner = (label: Double) => new BinaryLabelCounter(0L, 0L) += label, mergeValue = (c: BinaryLabelCounter, label: Double) => c += label, mergeCombiners = (c1: BinaryLabelCounter, c2: BinaryLabelCounter) => c1 += c2 ).sortByKey(ascending = false) // first use val binnedCounts = // Only down-sample if bins is > 0 if (numBins == 0) { // Use original directly counts } else { val countsSize = counts.count() //second use
This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
- duplicates
-
SPARK-29818 Missing persist on RDD
-
- Resolved
-
- links to