Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.0
-
None
Description
When adding fields to a result of a non-deterministic UDF, that returns a struct, then that UDF is executed multiple times (once per field) for each row.
In this UT df1 passes, but df2 fails with something like:
"279751724 did not equal -1023188908"
test("SPARK-XXXXX: non-deterministic UDF should be called once when adding fields") { val nondeterministicUDF = udf((s: Int) => { val r = Random.nextInt() // Both values should be the same GroupByKey(r, r) }).asNondeterministic() val df1 = spark.range(5).select(nondeterministicUDF($"id")) df1.collect().foreach { row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1)) } val df2 = spark.range(5).select(nondeterministicUDF($"id").withField("new", lit(7))) df2.collect().foreach { row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1)) } }