Details
Description
x=sc.parallelize([1,3,10,9]).cache()
x.count()
x=sc.parallelize([1,3,10,9]).cache()
x.count()
sqlContex.clearCache()
Overwriting x will create an orphan RDD, which cannot be deleted with clearCache(). This happens in both scala and pyspark.
Similar thing happens for rdds created from dataframe in python
spark.read.csv(....).rdd()
However, in scala clearCache can get rid of some orphan rdd types.