Three broadcast variables created at the beginning of Word2Vec.fit() are never deleted nor unpersisted. This seems to cause excessive memory consumption on the driver for a job running hundreds of successive training.
val expTable = sc.broadcast(createExpTable()) val bcVocab = sc.broadcast(vocab) val bcVocabHash = sc.broadcast(vocabHash)
- relates to
SPARK-11898 Use broadcast for the global tables in Word2Vec
- links to