Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
2.3.0
-
None
-
None
Description
See failing test at https://github.com/apache/spark/pull/19917
Failing:
test("SPARK-ABC123: support select with a splatted stream") { val df = spark.createDataFrame(sparkContext.emptyRDD[Row], StructType(List("bar", "foo").map { StructField(_, StringType, false) })) val allColumns = Stream(df.col("bar"), col("foo")) val result = df.select(allColumns : _*) }
Succeeds:
test("SPARK-ABC123: support select with a splatted stream") { val df = spark.createDataFrame(sparkContext.emptyRDD[Row], StructType(List("bar", "foo").map { StructField(_, StringType, false) })) val allColumns = Seq(df.col("bar"), col("foo")) val result = df.select(allColumns : _*) }
After stepping through in a debugger, the difference manifests at https://github.com/apache/spark/blob/8ae004b4602266d1f210e4c1564246d590412c06/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala#L120
Changing seq.map to seq.toList.map causes the test to pass.
I think there's a very subtle bug here where the Seq of column names passed into select is expected to eagerly evaluate when .map is called on it, even though that's not part of the Seq contract.