Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.6.0
Description
The Union operator can be a bottleneck when combining large data sets. The operator can be sped by acting as a passthrough operator when the input and output tuple layout matches exactly.
For the query below UNION ALL consumed 25% of overall time.
select count(*) from (select c_last_name,c_first_name,d_date, sum(q18.c3) c3,count(*) c4 from ( select c_last_name,c_first_name,d_date,1 as c3 from (select c_last_name,c_first_name,d_date from (select c_last_name, c_first_name, d_date, q14.c3 c3, q14.c4 c4 from (select c_last_name,c_first_name,d_date,sum(q13.c3) c3,count(*) c4 from ( select c_last_name,c_first_name,d_date,1 as c3 from customer, date_dim, store_sales where (d_month_seq between 1215 and 1226) and (ss_customer_sk = c_customer_sk) and (ss_sold_date_sk = d_date_sk) union all select c_last_name,c_first_name,d_date,-1 as c3 from customer, date_dim, catalog_sales where (d_month_seq between 1215 and 1226) and (cs_bill_customer_sk = c_customer_sk) and (cs_sold_date_sk = d_date_sk) ) as q13 group by c_last_name, c_first_name, d_date ) as q14 where ((q14.c4 - case when (q14.c3 >= 0) then q14.c3 else -(q14.c3) end) >= 2) ) as q15 ) as q16 union all select c_last_name,c_first_name,d_date,-1 as c3 from customer,date_dim,web_sales where (d_month_seq between 1215 and 1226) and (ws_bill_customer_sk = c_customer_sk) and (ws_sold_date_sk = d_date_sk) ) as q18 group by c_last_name, c_first_name, d_date ) as q19 where ((q19.c4 - case when (q19.c3 >= 0) then q19.c3 else -(q19.c3) end) >= 2)
Summary
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------------------- 31:AGGREGATE 1 414.293ms 414.293ms 1 1 92.00 KB -1.00 B FINALIZE 30:EXCHANGE 1 151.874us 151.874us 20 1 0 -1.00 B UNPARTITIONED 19:AGGREGATE 20 226.910ms 294.268ms 20 1 12.00 KB 10.00 MB 29:AGGREGATE 20 7s391ms 8s845ms 44.51M 75.60B 1.13 GB 933.80 GB FINALIZE 28:EXCHANGE 20 517.932ms 631.483ms 233.99M 75.60B 0 0 HASH(c_last_name,c_first_na... 18:AGGREGATE 20 17s312ms 25s953ms 233.99M 75.60B 1.76 GB 6626.13 GB STREAMING 00:UNION 20 19s022ms 28s694ms 2.27B 75.60B 948.10 KB 0 |--17:HASH JOIN 20 3s257ms 4s993ms 2.16B 10.80B 2.06 MB 298.19 KB INNER JOIN, BROADCAST | |--27:EXCHANGE 20 20.031us 33.432us 365 7.30K 0 0 BROADCAST | | 14:SCAN HDFS 1 20.955ms 20.955ms 365 7.30K 5.86 MB 48.00 MB tpcds_15000_decimal_parquet... | 16:HASH JOIN 20 7s244ms 10s364ms 2.16B 10.80B 394.04 MB 208.20 MB INNER JOIN, BROADCAST | |--26:EXCHANGE 20 177.055ms 198.769ms 3.85M 3.85M 0 0 BROADCAST | | 13:SCAN HDFS 18 12.368ms 17.726ms 3.85M 3.85M 35.96 MB 72.00 MB tpcds_15000_decimal_parquet... | 15:SCAN HDFS 20 557.460ms 825.397ms 2.16B 10.80B 120.83 MB 88.00 MB tpcds_15000_decimal_parquet... 25:AGGREGATE 20 17s407ms 24s977ms 109.12M 64.80B 2.01 GB 5679.60 GB FINALIZE 24:EXCHANGE 20 1s422ms 2s113ms 593.83M 64.80B 0 0 HASH(c_last_name,c_first_na... 12:AGGREGATE 20 1m25s 1m48s 593.83M 64.80B 4.01 GB 5679.60 GB STREAMING 01:UNION 20 1m39s 2m3s 12.32B 64.80B 883.75 KB 0 |--11:HASH JOIN 20 6s108ms 7s925ms 4.26B 21.60B 2.06 MB 298.19 KB INNER JOIN, BROADCAST | |--23:EXCHANGE 20 18.296us 32.651us 365 7.30K 0 0 BROADCAST | | 08:SCAN HDFS 1 25.291ms 25.291ms 365 7.30K 5.86 MB 48.00 MB tpcds_15000_decimal_parquet... | 10:HASH JOIN 20 13s701ms 17s430ms 4.26B 21.60B 394.04 MB 208.20 MB INNER JOIN, BROADCAST | |--22:EXCHANGE 20 159.064ms 181.949ms 3.85M 3.85M 0 0 BROADCAST | | 07:SCAN HDFS 18 8.930ms 11.820ms 3.85M 3.85M 35.96 MB 72.00 MB tpcds_15000_decimal_parquet... | 09:SCAN HDFS 20 999.371ms 1s237ms 4.27B 21.60B 212.29 MB 88.00 MB tpcds_15000_decimal_parquet... 06:HASH JOIN 20 11s606ms 14s429ms 8.06B 43.20B 2.06 MB 298.19 KB INNER JOIN, BROADCAST |--21:EXCHANGE 20 16.435us 27.733us 365 7.30K 0 0 BROADCAST | 03:SCAN HDFS 1 19.860ms 19.860ms 365 7.30K 5.86 MB 48.00 MB tpcds_15000_decimal_parquet... 05:HASH JOIN 20 21s797ms 27s529ms 8.06B 43.20B 394.04 MB 208.20 MB INNER JOIN, BROADCAST |--20:EXCHANGE 20 152.883ms 181.123ms 3.85M 3.85M 0 0 BROADCAST | 02:SCAN HDFS 18 8.797ms 12.102ms 3.85M 3.85M 36.02 MB 72.00 MB tpcds_15000_decimal_parquet... 04:SCAN HDFS 20 1s807ms 2s247ms 8.25B 43.20B 212.40 MB 88.00 MB tpcds_15000_decimal_parquet...
Attachments
Attachments
Issue Links
- breaks
-
IMPALA-5188 DCHECK in UnionNode::GetNextPassThrough with GROUP BY, AVG
-
- Resolved
-