Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0, 3.5.1, 3.4.3
Description
val ds1 = Seq(1).toDS() val ds2 = Seq[Int]().toDS() val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) ds1.join(ds2, ds1("value") === ds2("value"), "outer").select(f(struct(ds1("value"), ds2("value")))).show() ds1.join(ds2, ds1("value") === ds2("value"), "outer").select(struct(ds1("value"), ds2("value"))).show()
outputs
+---------------------------------------+ |UDF(struct(value, value, value, value))| +---------------------------------------+ | {1, 0}| +---------------------------------------+ +--------------------+ |struct(value, value)| +--------------------+ | {1, NULL}| +--------------------+
So when the result is passed to UDF the null-ability after the the join is not respected and we incorrectly end up with a 0 value instead of a null/None value.