Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
Existential Subquery to Inner Join
Another enhancement that uses Informational Constraints is existential subquery to inner join. This rewrite converts an existential subquery to an inner join, and thus provides alternative join choices for the Optimizer based on the selectivity of the tables.
An example using TPC-DS schema is shown below.
select c_first_name, c_last_name, c_email_address from customer c where EXISTS (select * from store_sales, date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 4 and 4+3)
Spark uses left semi-join to evaluated existential subqueries. A left semi-join will return a row from the outer table if there is at least one match in the inner. Semi-join is a general used technique to rewrite existential subqueries, but it has some limitations as it imposes a certain order of the joined table. In this case the large fact table store_sales has to be on the inner of the join. A more efficient execution can be obtained if the subquery is converted to a regular Inner join. This will allow the Optimizer to choose better join orders.
Converting a subquery to inner join is possible if either the subquery produces at most one row or, by introducing a Distinct on the outer table’s row key in order to remove the duplicate rows that will result after the inner join and thus to enforce the semantics of the subquery. As a key for the outer, we can use the primary key of the customer table.
Internal query after rewrite:
select distinct c_customer_sk /*PK */, c_first_name, c_last_name, c_email_address
from customer c, store_sales, date_dim
where c.c_customer_sk = ss_customer_sk and
ss_sold_date_sk = d_date_sk and
d_year = 2002 and
d_moy between 4 and 4+3
Example performance results using 1TB TPC-DS benchmark:
TPC-DS Query | spark-2.2 | spark-2.2 w/ sub2join | Query speedup |
---|---|---|---|
(secs) | (secs) | ||
Q10 | 355 | 190 | 2x |
Q16 | 1394 | 706 | 2x |
Q35 | 462 | 285 | 1.5x |
Q69 | 327 | 173 | 1.5x |
Q94 | 603 | 307 | 2x |