Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3586

Passthrough mode for Union ALL operator

    XMLWordPrintableJSON

Details

    Description

      The Union operator can be a bottleneck when combining large data sets. The operator can be sped by acting as a passthrough operator when the input and output tuple layout matches exactly.

      For the query below UNION ALL consumed 25% of overall time.

       select count(*)
                  from
                    (select c_last_name,c_first_name,d_date, sum(q18.c3) c3,count(*) c4
                     from
                       (
                        select
                          c_last_name,c_first_name,d_date,1 as c3
                        from
                          (select c_last_name,c_first_name,d_date	
                           from
                             (select c_last_name, c_first_name, d_date, q14.c3 c3, q14.c4 c4
                              from
                                (select
                                   c_last_name,c_first_name,d_date,sum(q13.c3) c3,count(*) c4
                                 from
                                   (
                                    select c_last_name,c_first_name,d_date,1 as c3
                                    from
                                      customer, date_dim, store_sales
                                    where
                                      (d_month_seq between 1215 and 1226) and
                                      (ss_customer_sk = c_customer_sk) and
                                      (ss_sold_date_sk = d_date_sk) 
                                    union all
                                    select c_last_name,c_first_name,d_date,-1 as c3
                                    from
                                      customer, date_dim, catalog_sales 
                                    where
                                      (d_month_seq between 1215 and 1226) and
                                      (cs_bill_customer_sk = c_customer_sk) and
                                      (cs_sold_date_sk = d_date_sk) 
                                   ) as q13
                                 group by
                                   c_last_name,
                                   c_first_name,
                                   d_date
                                ) as q14
                              where
                                ((q14.c4 - case when (q14.c3 >= 0) then q14.c3 else -(q14.c3) end) >= 2)
                             ) as q15
                          ) as q16
                          union all
      					select c_last_name,c_first_name,d_date,-1 as c3
                        	from
                          	customer,date_dim,web_sales 
                        	where
      	                    (d_month_seq between 1215 and 1226) and
      	                    (ws_bill_customer_sk = c_customer_sk) and
      	                    (ws_sold_date_sk = d_date_sk) 
                          ) as q18
                     group by
                       c_last_name,
                       c_first_name,
                       d_date
                    ) as q19
                  where
                    ((q19.c4 - case when (q19.c3 >= 0) then q19.c3 else -(q19.c3) end) >= 2)
      

      Summary

      Operator             #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                         
      ---------------------------------------------------------------------------------------------------------------------------------
      31:AGGREGATE              1  414.293ms  414.293ms        1           1   92.00 KB        -1.00 B  FINALIZE                       
      30:EXCHANGE               1  151.874us  151.874us       20           1          0        -1.00 B  UNPARTITIONED                  
      19:AGGREGATE             20  226.910ms  294.268ms       20           1   12.00 KB       10.00 MB                                 
      29:AGGREGATE             20    7s391ms    8s845ms   44.51M      75.60B    1.13 GB      933.80 GB  FINALIZE                       
      28:EXCHANGE              20  517.932ms  631.483ms  233.99M      75.60B          0              0  HASH(c_last_name,c_first_na... 
      18:AGGREGATE             20   17s312ms   25s953ms  233.99M      75.60B    1.76 GB     6626.13 GB  STREAMING                      
      00:UNION                 20   19s022ms   28s694ms    2.27B      75.60B  948.10 KB              0                                 
      |--17:HASH JOIN          20    3s257ms    4s993ms    2.16B      10.80B    2.06 MB      298.19 KB  INNER JOIN, BROADCAST          
      |  |--27:EXCHANGE        20   20.031us   33.432us      365       7.30K          0              0  BROADCAST                      
      |  |  14:SCAN HDFS        1   20.955ms   20.955ms      365       7.30K    5.86 MB       48.00 MB  tpcds_15000_decimal_parquet... 
      |  16:HASH JOIN          20    7s244ms   10s364ms    2.16B      10.80B  394.04 MB      208.20 MB  INNER JOIN, BROADCAST          
      |  |--26:EXCHANGE        20  177.055ms  198.769ms    3.85M       3.85M          0              0  BROADCAST                      
      |  |  13:SCAN HDFS       18   12.368ms   17.726ms    3.85M       3.85M   35.96 MB       72.00 MB  tpcds_15000_decimal_parquet... 
      |  15:SCAN HDFS          20  557.460ms  825.397ms    2.16B      10.80B  120.83 MB       88.00 MB  tpcds_15000_decimal_parquet... 
      25:AGGREGATE             20   17s407ms   24s977ms  109.12M      64.80B    2.01 GB     5679.60 GB  FINALIZE                       
      24:EXCHANGE              20    1s422ms    2s113ms  593.83M      64.80B          0              0  HASH(c_last_name,c_first_na... 
      12:AGGREGATE             20      1m25s      1m48s  593.83M      64.80B    4.01 GB     5679.60 GB  STREAMING                      
      01:UNION                 20      1m39s       2m3s   12.32B      64.80B  883.75 KB              0                                 
      |--11:HASH JOIN          20    6s108ms    7s925ms    4.26B      21.60B    2.06 MB      298.19 KB  INNER JOIN, BROADCAST          
      |  |--23:EXCHANGE        20   18.296us   32.651us      365       7.30K          0              0  BROADCAST                      
      |  |  08:SCAN HDFS        1   25.291ms   25.291ms      365       7.30K    5.86 MB       48.00 MB  tpcds_15000_decimal_parquet... 
      |  10:HASH JOIN          20   13s701ms   17s430ms    4.26B      21.60B  394.04 MB      208.20 MB  INNER JOIN, BROADCAST          
      |  |--22:EXCHANGE        20  159.064ms  181.949ms    3.85M       3.85M          0              0  BROADCAST                      
      |  |  07:SCAN HDFS       18    8.930ms   11.820ms    3.85M       3.85M   35.96 MB       72.00 MB  tpcds_15000_decimal_parquet... 
      |  09:SCAN HDFS          20  999.371ms    1s237ms    4.27B      21.60B  212.29 MB       88.00 MB  tpcds_15000_decimal_parquet... 
      06:HASH JOIN             20   11s606ms   14s429ms    8.06B      43.20B    2.06 MB      298.19 KB  INNER JOIN, BROADCAST          
      |--21:EXCHANGE           20   16.435us   27.733us      365       7.30K          0              0  BROADCAST                      
      |  03:SCAN HDFS           1   19.860ms   19.860ms      365       7.30K    5.86 MB       48.00 MB  tpcds_15000_decimal_parquet... 
      05:HASH JOIN             20   21s797ms   27s529ms    8.06B      43.20B  394.04 MB      208.20 MB  INNER JOIN, BROADCAST          
      |--20:EXCHANGE           20  152.883ms  181.123ms    3.85M       3.85M          0              0  BROADCAST                      
      |  02:SCAN HDFS          18    8.797ms   12.102ms    3.85M       3.85M   36.02 MB       72.00 MB  tpcds_15000_decimal_parquet... 
      04:SCAN HDFS             20    1s807ms    2s247ms    8.25B      43.20B  212.40 MB       88.00 MB  tpcds_15000_decimal_parquet... 
      

      Attachments

        Issue Links

          Activity

            People

              tarasbob Taras Bobrovytsky
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: