Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38221

Group by a stream of complex expressions fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.1, 3.3.0
    • 3.3.0, 3.2.2
    • SQL
    • None

    Description

      This query fails:

      scala> Seq(1).toDF("id").groupBy(Stream($"id" + 1, $"id" + 2): _*).sum("id").show(false)
      java.lang.IllegalStateException: Couldn't find _groupingexpression#24 in [id#4,_groupingexpression#23]
        at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
        at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:425)
        at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
        at org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
        at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1173)
        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1163)
        at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1173)
        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1163)
        at scala.collection.immutable.Stream.foreach(Stream.scala:534)
        at scala.collection.TraversableOnce.count(TraversableOnce.scala:152)
        at scala.collection.TraversableOnce.count$(TraversableOnce.scala:145)
        at scala.collection.AbstractTraversable.count(Traversable.scala:108)
        at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.createCode(GenerateUnsafeProjection.scala:293)
        at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:623)
      

      However, replace Stream with Seq and it works:

      scala> Seq(1).toDF("id").groupBy(Seq($"id" + 1, $"id" + 2): _*).sum("id").show(false)
      +--------+--------+-------+
      |(id + 1)|(id + 2)|sum(id)|
      +--------+--------+-------+
      |2       |3       |1      |
      +--------+--------+-------+
      
      scala> 
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: