Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Sort node after exchange doesn't start sorting until all data is received which add lots of latency to the query.
Not clear if this optimization would still make sense for a Scan followed by a sort run using the same thread.
Query
insert into tpcds_1000_parquet.store_sales_insert partition(ss_sold_date_sk, ss_quantity) /*+ clustered*/
select
ss_sold_time_sk,
ss_item_sk ,
ss_customer_sk,
ss_cdemo_sk,
ss_hdemo_sk,
ss_addr_sk,
ss_store_sk,
ss_promo_sk,
ss_ticket_number ,
ss_wholesale_cost ,
ss_list_price ,
ss_sales_price ,
ss_ext_discount_amt ,
ss_ext_sales_price ,
ss_ext_wholesale_cost ,
ss_ext_list_price ,
ss_ext_tax ,
ss_coupon_amt ,
ss_net_paid ,
ss_net_paid_inc_tax ,
ss_net_profit,
ss_sold_date_sk , ss_quantity
from store_sales
Plan
WRITE TO HDFS [tpcds_1000_parquet.store_sales_insert, OVERWRITE=false, PARTITION-KEYS=(ss_sold_date_sk,ss_quantity)]
| partitions=180576
| hosts=15 per-host-mem=17.88GB
|
02:SORT
| order by: ss_sold_date_sk DESC NULLS LAST, ss_quantity DESC NULLS LAST
| hosts=15 per-host-mem=1.45GB
| tuple-ids=1 row-size=100B cardinality=2879987999
|
01:EXCHANGE [HASH(ss_sold_date_sk,ss_quantity)]
| hosts=15 per-host-mem=0B
| tuple-ids=0 row-size=100B cardinality=2879987999
|
00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]
partitions=1824/1824 files=1824 size=189.24GB
table stats: 2879987999 rows total
column stats: all
hosts=15 per-host-mem=88.00MB
tuple-ids=0 row-size=100B cardinality=2879987999
Attachments
Issue Links
- is related to
-
IMPALA-6692 When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster
- Reopened