Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33228

Don't uncache data when replacing an existing view having the same plan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.8, 3.0.2, 3.1.0
    • 2.4.8, 3.0.2, 3.1.0
    • SQL
    • None

    Description

      SPARK-30494's updated the `CreateViewCommand` code to implicitly drop cache when replacing an existing view. But, this change drops cache even when replacing a view having the same logical plan. A sequence of queries to reproduce this as follows;

      scala> val df = spark.range(1).selectExpr("id a", "id b")
      scala> df.cache()
      scala> df.explain()
      == Physical Plan ==
      *(1) ColumnarToRow
      +- InMemoryTableScan [a#2L, b#3L]
       +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas)
       +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
       +- *(1) Range (0, 1, step=1, splits=4)
      
      
      scala> df.createOrReplaceTempView("t")
      scala> sql("select * from t").explain()
      == Physical Plan ==
      *(1) ColumnarToRow
      +- InMemoryTableScan [a#2L, b#3L]
       +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas)
       +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
       +- *(1) Range (0, 1, step=1, splits=4)
      
      
      // If one re-runs the same query `df.createOrReplaceTempView("t")`, the cache's swept away
      scala> df.createOrReplaceTempView("t")
      scala> sql("select * from t").explain()
      == Physical Plan ==
      *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
      +- *(1) Range (0, 1, step=1, splits=4)
      

      Attachments

        Activity

          People

            maropu Takeshi Yamamuro
            maropu Takeshi Yamamuro
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: