Description
In some case, We can infer filters from Join condition with OR expressions.
for example, tpc-ds query 48:
select sum (ss_quantity) from store_sales, store, customer_demographics, customer_address, date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and ( ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Secondary' and ss_sales_price between 100.00 and 150.00 ) or ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = 'College' and ss_sales_price between 50.00 and 100.00 ) or ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = '2 yr Degree' and ss_sales_price between 150.00 and 200.00 ) ) and ( ( ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('AL', 'OH', 'MD') and ss_net_profit between 0 and 2000 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('VA', 'TX', 'IA') and ss_net_profit between 150 and 3000 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('RI', 'WI', 'KY') and ss_net_profit between 50 and 25000 ) ) ;
we can infer two filters from the join or condidtion:
for customer_demographics: cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr Degree’,’Secondary’,’Primary') for store_sales: (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 and 100.00 or ss_sales_price between 150.00 and 200.00)
then then we can push down the above two filters to filter customer_demographics/store_sales.
A pr will be submit soon.
Attachments
Issue Links
- duplicates
-
SPARK-31705 Push more possible predicates through Join via CNF
- Resolved
- relates to
-
SPARK-31705 Push more possible predicates through Join via CNF
- Resolved
- links to