Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9565 Spark SQL 1.5.0 QA/testing umbrella
  3. SPARK-9950

Wrong Analysis Error for grouping/aggregating on struct fields

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.5.0
    • 1.5.0
    • SQL
    • None
    • Spark 1.5 doc/QA sprint

    Description

      Spark 1.4:

      import org.apache.spark.sql.functions._
      val df = Seq(("x", (1,1)), ("y", (2, 2))).toDF("a", "b")
      df.groupBy("b._1").agg(sum("b._2")).collect()
      
      df: org.apache.spark.sql.DataFrame = [a: string, b: struct<_1:int,_2:int>]
      res0: Array[org.apache.spark.sql.Row] = Array([1,1], [2,2])
      

      Spark 1.5

      org.apache.spark.sql.AnalysisException: expression 'b' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;
      	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
      	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
      	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:110)
      

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            marmbrus Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: