Description
This problem was found during the development of SPARK-18874.
The EXISTS form in the following query:
sql("select * from t1 inner join t2 on t1.t1a=t2.t2a where exists (select 1 from t3 where t1.t1b=t3.t3b)")
gives the optimized plan below:
== Optimized Logical Plan == Join Inner, (t1a#7 = t2a#25) :- Join LeftSemi, (t1b#8 = t3b#58) : :- Filter isnotnull(t1a#7) : : +- Relation[t1a#7,t1b#8,t1c#9] parquet : +- Project [1 AS 1#271, t3b#58] : +- Relation[t3a#57,t3b#58,t3c#59] parquet +- Filter isnotnull(t2a#25) +- Relation[t2a#25,t2b#26,t2c#27] parquet
whereas a semantically equivalent Left Semi join query below:
sql("select * from t1 inner join t2 on t1.t1a=t2.t2a left semi join t3 on t1.t1b=t3.t3b")
gives the following optimized plan:
== Optimized Logical Plan == Join LeftSemi, (t1b#8 = t3b#58) :- Join Inner, (t1a#7 = t2a#25) : :- Filter (isnotnull(t1b#8) && isnotnull(t1a#7)) : : +- Relation[t1a#7,t1b#8,t1c#9] parquet : +- Filter isnotnull(t2a#25) : +- Relation[t2a#25,t2b#26,t2c#27] parquet +- Project [t3b#58] +- Relation[t3a#57,t3b#58,t3c#59] parquet
Attachments
Issue Links
- is duplicated by
-
SPARK-22573 SQL Planner is including unnecessary columns in the projection
- Resolved
- links to
(6 links to)