Details
Description
SPARK-30494's updated the `CreateViewCommand` code to implicitly drop cache when replacing an existing view. But, this change drops cache even when replacing a view having the same logical plan. A sequence of queries to reproduce this as follows;
scala> val df = spark.range(1).selectExpr("id a", "id b") scala> df.cache() scala> df.explain() == Physical Plan == *(1) ColumnarToRow +- InMemoryTableScan [a#2L, b#3L] +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) scala> df.createOrReplaceTempView("t") scala> sql("select * from t").explain() == Physical Plan == *(1) ColumnarToRow +- InMemoryTableScan [a#2L, b#3L] +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) // If one re-runs the same query `df.createOrReplaceTempView("t")`, the cache's swept away scala> df.createOrReplaceTempView("t") scala> sql("select * from t").explain() == Physical Plan == *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4)