[SPARK-33941] ALTER TABLE .. DROP PARTITION doesn't invalidate the cache - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.7, 3.0.1, 3.1.0, 3.2.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

1. Create a partitioned table:

scala> sql("CREATE TABLE tbl2 (id int, part int) USING parquet PARTITIONED BY (part)")
res31: org.apache.spark.sql.DataFrame = []

2. Create partitions:

scala> sql("INSERT INTO tbl2 PARTITION (part=0) SELECT 0")
res32: org.apache.spark.sql.DataFrame = []

scala> sql("INSERT INTO tbl2 PARTITION (part=1) SELECT 1")
res33: org.apache.spark.sql.DataFrame = []

3. Create dataframes from the table:

scala> val df1 = spark.table("tbl2")
df1: org.apache.spark.sql.DataFrame = [id: int, part: int]

scala> val df2 = spark.table("tbl2")
df2: org.apache.spark.sql.DataFrame = [id: int, part: int]

4. Cache df2 and fill in the cache:

scala> df2.cache
res34: df2.type = [id: int, part: int]

scala> df1.show(false)
+---+----+
|id |part|
+---+----+
|0  |0   |
|1  |1   |
+---+----+


scala> df2.show(false)
+---+----+
|id |part|
+---+----+
|0  |0   |
|1  |1   |
+---+----+

5. Drop a partition:

scala> sql("ALTER TABLE tbl2 DROP PARTITION(part=0)")
res37: org.apache.spark.sql.DataFrame = []

scala> df1.show(false)
+---+----+
|id |part|
+---+----+
|0  |0   |
|1  |1   |
+---+----+


scala> df2.show(false)
+---+----+
|id |part|
+---+----+
|0  |0   |
|1  |1   |
+---+----+

The same without caching:

scala> sql("CREATE TABLE tbl3 (id int, part int) USING parquet PARTITIONED BY (part)")
res40: org.apache.spark.sql.DataFrame = []

scala> sql("INSERT INTO tbl3 PARTITION (part=0) SELECT 0")
res41: org.apache.spark.sql.DataFrame = []

scala> sql("INSERT INTO tbl3 PARTITION (part=1) SELECT 1")
res42: org.apache.spark.sql.DataFrame = []

scala> val df3 = spark.table("tbl3")
df3: org.apache.spark.sql.DataFrame = [id: int, part: int]

scala> df3.show(false)
+---+----+
|id |part|
+---+----+
|0  |0   |
|1  |1   |
+---+----+


scala> sql("ALTER TABLE tbl3 DROP PARTITION(part=0)")
res44: org.apache.spark.sql.DataFrame = []

scala> df3.show(false)
+---+----+
|id |part|
+---+----+
|1  |1   |
+---+----+

Attachments

Issue Links

is blocked by

SPARK-33950 ALTER TABLE .. DROP PARTITION doesn't refresh cache

Resolved

links to

[Github] Pull Request #30979 (MaxGekk)

Activity

People

Assignee:: Unassigned

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Dec/20 09:38

Updated:: 12/Dec/22 18:10