Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0, 2.0.1, 2.0.2, 2.1.0
Description
When putting more than one column in the NOT IN, the query may not return correctly if there is a null data. We can demonstrate the problem with the following data set and query:
Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show +---+---+ | a1| b1| +---+---+ +---+---+
Attachments
Issue Links
- relates to
-
SPARK-18966 NOT IN subquery with correlated expressions may return incorrect result
- Resolved
- links to