Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.4.3
-
None
-
None
Description
The rdd is used in more than one actions: first() and actions in computePrincipalComponentsAndExplainedVariance(), so it needs to be persisted.
def fit(sources: RDD[Vector]): PCAModel = { // first use rdd sources on action first() val numFeatures = sources.first().size require(k <= numFeatures, s"source vector size $numFeatures must be no less than k=$k") require(PCAUtil.memoryCost(k, numFeatures) < Int.MaxValue, "The param k and numFeatures is too large for SVD computation. " + "Try reducing the parameter k for PCA, or reduce the input feature " + "vector dimension to make this tractable.") val mat = new RowMatrix(sources) // second use rdd sources val (pc, explainedVariance) = mat.computePrincipalComponentsAndExplainedVariance(k)
Attachments
Issue Links
- duplicates
-
SPARK-29818 Missing persist on RDD
- Resolved
- links to