Details
-
Sub-task
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
Inner Join Elimination based on Informational RI constraints
This transformation detects RI joins and eliminates the parent/PK table if none of its columns, other than the PK columns, are referenced in the query.
Typical examples that benefit from this rewrite are queries over complex views.
View using TPC-DS schema:
create view customer_purchases_2002 (id, last, first, product, store_id, month, quantity) as select c_customer_id, c_last_name, c_first_name, i_product_name, s_store_id, d_moy, ss_quantity from store_sales, date_dim, customer, item, store where d_date_sk = ss_sold_date_sk and c_customer_sk = ss_customer_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and d_year = 2002
The view returns customer purchases made in year 2002. It is a join between fact table store_sales and dimensions customer, item, store, and date. The tables are joined using RI predicates.
If we write a query that only selects a subset of columns from the view, for example, we are only interested in the items bought and not the stores, internally, the Optimizer, will first merge the view into the query, and then, based on the primary key – foreign key join predicate analysis, it will decide that the join with the store table is not needed, and therefore the store table is removed.
Query:
select id, first, last, product, quantity from customer_purchases_2002 where product like ‘bicycle%’ and month between 1 and 2
Internal query after view expansion:
select c_customer_id as id, c_first_name as first, c_last_name as last, i_product_name as product,ss_quantity as quantity from store_sales, date_dim, customer, item, store where d_date_sk = ss_sold_date_sk and c_customer_sk = ss_customer_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and d_year = 2002 and month between 1 and 2 and product like ‘bicycle%’
Internal optimized query after join elimination:
select c_customer_id as id, c_first_name as first, c_last_name as last, i_product_name as product,ss_quantity as quantity from store_sales, date_dim, customer, item where d_date_sk = ss_sold_date_sk and c_customer_sk = ss_customer_sk and i_item_sk = ss_item_sk and d_year = 2002 and month between 1 and 2 and product like ‘bicycle%’
The join with store table can be removed since no columns are retrieved from the table, and every row from the store_sales fact table will find a match in store based on the RI relationship.