Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.0
-
None
-
None
Description
I added a partition filter (predicate on sr_returned_date_sk) to TPC-DS query 1 to limit the scan of the store_returns table. The generated plan has a couple problems:
(1) The store_returns table is scanned twice (self join in query)
(2) One of the scans ignores the partition filter
Query:
explain with customer_total_return as (select sr_customer_sk as ctr_customer_sk ,sr_store_sk as ctr_store_sk ,sum(SR_RETURN_TAX) as ctr_total_return from store_returns ,date_dim where sr_returned_date_sk = d_date_sk and d_year =2002 and sr_returned_date_sk between 2452276 and 2452640 group by sr_customer_sk ,sr_store_sk) select * from ( select c_customer_id from customer_total_return ctr1 ,store ,customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_store_sk = ctr2.ctr_store_sk) and s_store_sk = ctr1.ctr_store_sk and s_state = 'LA' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id ) as big_return_customers limit 100;
Plan is attached.