Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21579

dropTempView has a critical BUG

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 2.1.1, 2.2.0
    • None
    • SQL
    • None
    • Important

    Description

      when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from http://127.0.0.1:4040/storage/.
      It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem.

      val spark = SparkSession.builder.master("local").appName("sparkTest").getOrCreate()
      val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), Row("p5", 40), Row("p6", 15))
      val schema = new StructType().add(StructField("name", StringType)).add(StructField("age", IntegerType))
      
      val rowRDD = spark.sparkContext.parallelize(rows, 3)
      val df = spark.createDataFrame(rowRDD, schema)
      df.createOrReplaceTempView("ods_table")
      spark.sql("cache table ods_table")
      
      spark.sql("cache table dwd_table1 as select * from ods_table where age>=25")
      spark.sql("cache table dwd_table2 as select * from dwd_table1 where name='p1'")
      spark.catalog.dropTempView("dwd_table1")
      //spark.catalog.dropTempView("ods_table")
      
      spark.sql("select * from dwd_table2").show()
      

      It will keep ods_table1 in memory, although it will not been used anymore. It waste memory, especially when my service diagram much more complex

      Attachments

        1. screenshot-1.png
          19 kB
          ant_nebula

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ant_nebula ant_nebula
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: