Details
Description
system shall update the table stats automatically if user set spark.sql.statistics.size.autoUpdate.enabled as true, currently this property is not having any significance even if it is enabled or disabled. This feature is similar to Hives auto-gather feature where statistics are automatically computed by default if this feature is enabled.
Reference:
https://cwiki.apache.org/confluence/display/Hive/StatsDev
Reproducing steps:
scala> spark.sql("create table table1 (name string,age int) stored as parquet")
scala> spark.sql("insert into table1 select 'a',29")
res2: org.apache.spark.sql.DataFrame = []
scala> spark.sql("desc extended table1").show(false)
---------------------------------------------------------------------------------------++------
col_name | data_type | comment |
---------------------------------------------------------------------------------------++------
name | string | null |
age | int | null |
|
||
Database | default | |
Table | table1 | |
Owner | Administrator | |
Created Time | Sun Apr 07 23:41:56 IST 2019 | |
Last Access | Thu Jan 01 05:30:00 IST 1970 | |
Created By | Spark 2.4.1 | |
Type | MANAGED | |
Provider | hive | |
Table Properties | [transient_lastDdlTime=1554660716] | |
Location | file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
Serde Library | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
InputFormat | org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
OutputFormat | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | |
Storage Properties | [serialization.format=1] | |
Partition Provider | Catalog |
---------------------------------------------------------------------------------------++------
Attachments
Issue Links
- links to