Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.0.0
-
None
Description
Distinct is an expensive operation. If possible, we should avoid it. When the child operators can guarantee the distinct, we can remove it.
For example, in the following TPC-DS query 38, the child is distinct, and thus, we can remove the top Distinct after converting Intersect to Left-semi + Distinct.
select count(*) from ( select distinct c_last_name, c_first_name, d_date from store_sales, date_dim, customer where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_customer_sk = customer.c_customer_sk and d_month_seq between [DMS] and [DMS] + 11 intersect select distinct c_last_name, c_first_name, d_date from catalog_sales, date_dim, customer where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk and d_month_seq between [DMS] and [DMS] + 11 intersect select distinct c_last_name, c_first_name, d_date from web_sales, date_dim, customer where web_sales.ws_sold_date_sk = date_dim.d_date_sk and web_sales.ws_bill_customer_sk = customer.c_customer_sk and d_month_seq between [DMS] and [DMS] + 11 ) hot_cyst