Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22725

df.select on a Stream is broken, vs a List

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      See failing test at https://github.com/apache/spark/pull/19917

      Failing:

        test("SPARK-ABC123: support select with a splatted stream") {
          val df = spark.createDataFrame(sparkContext.emptyRDD[Row], StructType(List("bar", "foo").map {
            StructField(_, StringType, false)
          }))
          val allColumns = Stream(df.col("bar"), col("foo"))
          val result = df.select(allColumns : _*)
        }
      

      Succeeds:

        test("SPARK-ABC123: support select with a splatted stream") {
          val df = spark.createDataFrame(sparkContext.emptyRDD[Row], StructType(List("bar", "foo").map {
            StructField(_, StringType, false)
          }))
          val allColumns = Seq(df.col("bar"), col("foo"))
          val result = df.select(allColumns : _*)
        }
      

      After stepping through in a debugger, the difference manifests at https://github.com/apache/spark/blob/8ae004b4602266d1f210e4c1564246d590412c06/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala#L120

      Changing seq.map to seq.toList.map causes the test to pass.

      I think there's a very subtle bug here where the Seq of column names passed into select is expected to eagerly evaluate when .map is called on it, even though that's not part of the Seq contract.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              aash Andrew Ash
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: