Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.4.3
-
None
-
None
Description
The RDD dataset is used by more than two actions in learnVocab(dataset) and doFit. It needs to be persisted.
def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel = { // Needs to persist dataset here learnVocab(dataset) // has action on dataset createBinaryTree() val sc = dataset.context val expTable = sc.broadcast(createExpTable()) val bcVocab = sc.broadcast(vocab) val bcVocabHash = sc.broadcast(vocabHash) try { doFit(dataset, sc, expTable, bcVocab, bcVocabHash) // has action on dataset
This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
Attachments
Issue Links
- duplicates
-
SPARK-29818 Missing persist on RDD
- Resolved