Description
Running
sql("SELECT id as a FROM RANGE(10)").createOrReplaceTempView("A") sql("SELECT NULL as a FROM RANGE(10)").createOrReplaceTempView("NULLTAB") sql("SELECT 1 as goo FROM A LEFT OUTER JOIN NULLTAB ON A.a = NULLTAB.a").collect()
results in:
org.apache.spark.sql.AnalysisException: Detected cartesian product for LEFT OUTER join between logical plans
Project
+- Range (0, 10, step=1, splits=None)
and
Project
+- Range (0, 10, step=1, splits=None)
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1121)
This is because NULLTAB.a is constant folded to null, and then the condition is constant folded altogether:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.NullPropagation === GlobalLimit 21 +- LocalLimit 21 +- Project [1 AS goo#28] ! +- Join LeftOuter, (a#0L = null) :- Project [id#1L AS a#0L] : +- Range (0, 10, step=1, splits=None) +- Project +- Range (0, 10, step=1, splits=None) GlobalLimit 21 +- LocalLimit 21 +- Project [1 AS goo#28] +- Join LeftOuter, null :- Project [id#1L AS a#0L] : +- Range (0, 10, step=1, splits=None) +- Project +- Range (0, 10, step=1, splits=None)
And then CheckCartesianProduct doesn't like it, even though the condition does not produce a cartesian product, but evaluates to null.