Details
Description
The example below portraits the issue:
spark-sql> create table tbl (col0 int, part int) partitioned by (part); spark-sql> insert into tbl partition (part = 0) select 0; spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true; spark-sql> alter table tbl add partition (part = 1);
There are no stats:
spark-sql> describe table extended tbl; col0 int NULL part int NULL # Partition Information # col_name data_type comment part int NULL # Detailed Table Information Database default Table tbl Owner maximgekk Created Time Tue Jan 12 12:00:03 MSK 2021 Last Access UNKNOWN Created By Spark 3.2.0-SNAPSHOT Type MANAGED Provider hive Table Properties [transient_lastDdlTime=1610442003] Location file:/Users/maximgekk/proj/fix-stats-in-add-partition/spark-warehouse/tbl Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [serialization.format=1] Partition Provider Catalog
As we can see there is no stats. For instance, ALTER TABLE .. DROP PARTITION updates stats:
spark-sql> alter table tbl drop partition (part = 1); spark-sql> describe table extended tbl; col0 int NULL part int NULL # Partition Information # col_name data_type comment part int NULL # Detailed Table Information ... Statistics 2 bytes
Attachments
Issue Links
- fixes
-
SPARK-34062 Call updateTableStats() from AlterTableAddPartitionCommand
- Resolved
- links to