[SPARK-33228] Don't uncache data when replacing an existing view having the same plan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.8, 3.0.2, 3.1.0
Fix Version/s: 2.4.8, 3.0.2, 3.1.0
Component/s: SQL
Labels:
None

Description

~~SPARK-30494~~'s updated the `CreateViewCommand` code to implicitly drop cache when replacing an existing view. But, this change drops cache even when replacing a view having the same logical plan. A sequence of queries to reproduce this as follows;

scala> val df = spark.range(1).selectExpr("id a", "id b")
scala> df.cache()
scala> df.explain()
== Physical Plan ==
*(1) ColumnarToRow
+- InMemoryTableScan [a#2L, b#3L]
 +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas)
 +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
 +- *(1) Range (0, 1, step=1, splits=4)


scala> df.createOrReplaceTempView("t")
scala> sql("select * from t").explain()
== Physical Plan ==
*(1) ColumnarToRow
+- InMemoryTableScan [a#2L, b#3L]
 +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas)
 +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
 +- *(1) Range (0, 1, step=1, splits=4)


// If one re-runs the same query `df.createOrReplaceTempView("t")`, the cache's swept away
scala> df.createOrReplaceTempView("t")
scala> sql("select * from t").explain()
== Physical Plan ==
*(1) Project [id#0L AS a#2L, id#0L AS b#3L]
+- *(1) Range (0, 1, step=1, splits=4)

Attachments

Issue Links

links to

[Github] Pull Request #30140 (maropu)

[Github] Pull Request #30152 (viirya)

[Github] Pull Request #30157 (maropu)

Activity

People

Assignee:: Takeshi Yamamuro

Reporter:: Takeshi Yamamuro

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Oct/20 13:18

Updated:: 27/Oct/20 23:59

Resolved:: 25/Oct/20 23:17