Description
Add support for the SQL standard distinct predicate to SPARK SQL.
<expression> IS [NOT] DISTINCT FROM <expression>
data = [(10, 20), (30, 30), (40, None), (None, None)] df = sc.parallelize(data).toDF(["c1", "c2"]) df.createTempView("df") spark.sql("select c1, c2 from df where c1 is not distinct from c2").collect() [Row(c1=30, c2=30), Row(c1=None, c2=None)]