Description
Currently timestamp column's stats (min/max) are stored in UTC in metastore, and when desc its min/max column stats, they are also shown in UTC.
As a result, for users not in UTC, the column stats (shown to users) are not consistent with the actual value, which causes confusion.
For example:
spark-sql> create table tab_ts_master (ts timestamp) using parquet; spark-sql> insert into tab_ts_master values make_timestamp(2022, 1, 1, 0, 0, 1.123456), make_timestamp(2022, 1, 3, 0, 0, 2.987654); spark-sql> select * from tab_ts_master; 2022-01-01 00:00:01.123456 2022-01-03 00:00:02.987654 spark-sql> set spark.sql.session.timeZone; spark.sql.session.timeZone Asia/Shanghai spark-sql> analyze table tab_ts_master compute statistics for all columns; spark-sql> desc formatted tab_ts_master ts; col_name ts data_type timestamp comment NULL min 2021-12-31 16:00:01.123456 max 2022-01-02 16:00:02.987654 num_nulls 0 distinct_count 2 avg_col_len 8 max_col_len 8 histogram NULL
Attachments
Issue Links
- links to