Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.4.3
-
None
-
None
Description
The rdd scoreAndLabels.combineByKey is used by two actions: sortByKey and count(), so it needs to be persisted.
val counts = scoreAndLabels.combineByKey( createCombiner = (label: Double) => new BinaryLabelCounter(0L, 0L) += label, mergeValue = (c: BinaryLabelCounter, label: Double) => c += label, mergeCombiners = (c1: BinaryLabelCounter, c2: BinaryLabelCounter) => c1 += c2 ).sortByKey(ascending = false) // first use val binnedCounts = // Only down-sample if bins is > 0 if (numBins == 0) { // Use original directly counts } else { val countsSize = counts.count() //second use
This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
Attachments
Issue Links
- duplicates
-
SPARK-29818 Missing persist on RDD
- Resolved
- links to