Description
Join followed by Group by on the same key is a recurring pattern which the Planner can optimize by pushing down a partial aggregate below the join to reduce the over cardinality.
TPC-DS query 23 is on of several queries that can benefit from such optimization
The pattern looks like the following
select c_customer_sk, sum(ss_quantity * ss_sales_price) csales from store_sales, customer, date_dim where ss_customer_sk = c_customer_sk and ss_sold_date_sk = d_date_sk and d_year in (1999 , 1999 + 1, 1999 + 2, 1999 + 3) group by c_customer_sk
A rewritten version of the query produces a 2x speedup since the NDV for ss_customer_sk is much smaller than number of rows in store_sales.
with ss as (select ss_customer_sk,sum(ss_quantity * ss_sales_price) csales from store_sales, date_dim where ss_sold_date_sk = d_date_sk and d_year in (1999 , 1999 + 1, 1999 + 2, 1999 + 3) group by ss_customer_sk ) select count(*) from (select c_customer_sk, sum(csales) csales from ss, customer where ss_customer_sk = c_customer_sk group by c_customer_sk)a;