Description
i think this is a correctness bug in spark 3.1.1
the behavior is correct in spark 3.0.1
in spark 3.0.1:
scala> import spark.implicits._ scala> import org.apache.spark.sql.functions._ scala> val x = Seq(Seq("aa", "bb", "cc")).toDF x: org.apache.spark.sql.DataFrame = [value: array<string>] scala> x.select(transform(col("value"), col => udf((_: String).drop(1)).apply(col))).show +---------------------------------------------------+ |transform(value, lambdafunction(UDF(lambda 'x), x))| +---------------------------------------------------+ | [a, b, c]| +---------------------------------------------------+
in spark 3.1.1:
scala> import spark.implicits._ scala> import org.apache.spark.sql.functions._ scala> val x = Seq(Seq("aa", "bb", "cc")).toDF x: org.apache.spark.sql.DataFrame = [value: array<string>] scala> x.select(transform(col("value"), col => udf((_: String).drop(1)).apply(col))).show +---------------------------------------------------+ |transform(value, lambdafunction(UDF(lambda 'x), x))| +---------------------------------------------------+ | [c, c, c]| +---------------------------------------------------+