Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19504

clearCache fails to delete orphan RDDs, especially in pyspark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.1.0
    • None
    • Optimizer, SQL
    • Both pyspark and scala spark. Although scala spark uncaches some RDD types even if orphan

    • Important

    Description

      x=sc.parallelize([1,3,10,9]).cache()
      x.count()
      x=sc.parallelize([1,3,10,9]).cache()
      x.count()
      sqlContex.clearCache()

      Overwriting x will create an orphan RDD, which cannot be deleted with clearCache(). This happens in both scala and pyspark.

      Similar thing happens for rdds created from dataframe in python
      spark.read.csv(....).rdd()
      However, in scala clearCache can get rid of some orphan rdd types.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rkarimi R
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified