Description
Today, null safe joins are executed with a Cartesian product.
scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain
== Physical Plan ==
TungstenProject [i#2,j#3,i#7,j#8]
Filter (i#2 <=> i#7)
CartesianProduct
LocalTableScan [i#2,j#3], [[1,1]]
LocalTableScan [i#7,j#8], [[1,1]]
One option is to add this rewrite to the optimizer:
select * from t a join t b on coalesce(a.i, <default>) = coalesce(b.i, <default>) AND (a.i <=> b.i)
Acceptance criteria: joins with only null safe equality should not result in a Cartesian product.
Attachments
Issue Links
- is duplicated by
-
SPARK-6072 Enable hash joins for null-safe equality predicates
- Resolved
- links to