Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30494

Duplicates cached RDD when create or replace an existing view

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
    • 2.4.6, 3.0.0
    • SQL
    • None

    Description

      We can reproduce by below commands:

      beeline> create or replace temporary view temp1 as select 1
      beeline> cache table temp1
      beeline> create or replace temporary view temp1 as select 1, 2
      beeline> cache table temp1
      

      The cached RDD for plan "select 1" stays in memory forever until the session close. This cached data cannot be used since the view temp1 has been replaced by another plan. It's a memory leak.

      assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 2")).isDefined)
      assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1")).isDefined)

      Attachments

        Activity

          People

            cltlfcjin Lantao Jin
            cltlfcjin Lantao Jin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: