Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Hive 3.1
Description
Here I used a simplified sample to illustrate the issue.
When there are multiple insert overwrite clauses, only the partitions related to the last clause will have column statistics. In the sample here, only the partition (ss_sold_date_sk=_HIVE_DEFAULT_PARTITION_) has column statistics, which is related to the last insert clause.
With "hive.stats.column.autogather", by default, is true, we expect that all the partitions' column statistics should be calculated.
create table web_sales ( ws_sold_time_sk bigint, ws_ship_date_sk bigint, ws_item_sk bigint ) partitioned by (ws_sold_date_sk bigint) stored as orc; from anotherdb.web_sales ws insert overwrite table web_sales partition (ws_sold_date_sk) select ws.ws_sold_time_sk, ws.ws_ship_date_sk, ws.ws_item_sk, ws.ws_sold_date_sk where ws.ws_sold_date_sk is not null insert overwrite table web_sales partition (ws_sold_date_sk) select ws.ws_sold_time_sk, ws.ws_ship_date_sk, ws.ws_item_sk, ws.ws_sold_date_sk where ws.ws_sold_date_sk is null sort by ws.ws_sold_date_sk ;