When we overwrite a partitioned data source table, currently Spark will truncate the entire table to write new data, or truncate a bunch of partitions according to the given static partitions.
For example, INSERT OVERWRITE tbl ... will truncate the entire table, INSERT OVERWRITE tbl PARTITION (a=1, b) will truncate all the partitions that starts with a=1.
This behavior is kind of reasonable as we can know which partitions will be overwritten before runtime. However, hive has a different behavior that it only overwrites related partitions, e.g. INSERT OVERWRITE tbl SELECT 1,2,3 will only overwrite partition a=2, b=3, assuming tbl has only one data column and is partitioned by a and b.
It seems better if we can follow hive's behavior.