Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
hive.optimize.sort.dynamic.partition=true introduces new stage to reduce number of writes in dynamic partitioning usecase. Earlier SortedDynPartitionOptimizer added this new operator via Optimizer.java and the stats for the newly added operator was populated via StatsRulesProcFactory$ReduceSinkStatsRule.
However, with "HIVE-20703" this got changed. This is moved to TezCompiler for cost based decision. Though the operator gets added correctly, the stats for this does not get added (as it runs after runStatsAnnotation()).
This causes reducer count to be mis-estimated in the query.
e.g For the following query, reducer_2 would be estimated as "2" instead of "1009". This causes huge delay in the runtime. explain from tpcds_xtext_1000.store_sales ss insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk where ss.ss_sold_date_sk is not null insert overwrite table store_sales partition (ss_sold_date_sk) select ss.ss_sold_time_sk, ss.ss_item_sk, ss.ss_customer_sk, ss.ss_cdemo_sk, ss.ss_hdemo_sk, ss.ss_addr_sk, ss.ss_store_sk, ss.ss_promo_sk, ss.ss_ticket_number, ss.ss_quantity, ss.ss_wholesale_cost, ss.ss_list_price, ss.ss_sales_price, ss.ss_ext_discount_amt, ss.ss_ext_sales_price, ss.ss_ext_wholesale_cost, ss.ss_ext_list_price, ss.ss_ext_tax, ss.ss_coupon_amt, ss.ss_net_paid, ss.ss_net_paid_inc_tax, ss.ss_net_profit, ss.ss_sold_date_sk where ss.ss_sold_date_sk is null distribute by ss.ss_item_sk ;
Attachments
Attachments
Issue Links
- is caused by
-
HIVE-20703 Put dynamic sort partition optimization under cost based decision
- Closed