Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.0
Description
When we overwrite a partitioned data source table, currently Spark will truncate the entire table to write new data, or truncate a bunch of partitions according to the given static partitions.
For example, INSERT OVERWRITE tbl ... will truncate the entire table, INSERT OVERWRITE tbl PARTITION (a=1, b) will truncate all the partitions that starts with a=1.
This behavior is kind of reasonable as we can know which partitions will be overwritten before runtime. However, hive has a different behavior that it only overwrites related partitions, e.g. INSERT OVERWRITE tbl SELECT 1,2,3 will only overwrite partition a=2, b=3, assuming tbl has only one data column and is partitioned by a and b.
It seems better if we can follow hive's behavior.
Attachments
Issue Links
- is related to
-
SPARK-28945 Allow concurrent writes to different partitions with dynamic partition overwrite
- Resolved
-
SPARK-30510 Publicly document options under spark.sql.*
- Resolved
- links to