Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4564

SchemaRDD.groupBy(groupingExprs)(aggregateExprs) doesn't return the groupingExprs as part of the output schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.1.0
    • None
    • SQL
    • None
    • Mac OSX, local mode, but should hold true for all environments

    Description

      In the following example, I would expect the "grouped" schema to contain two fields, the String name and the Long count, but it only contains the Long count.

      // Assumes val sc = new SparkContext(...), e.g., in Spark Shell
      import org.apache.spark.sql.{SQLContext, SchemaRDD}
      import org.apache.spark.sql.catalyst.expressions._
      
      val sqlc = new SQLContext(sc)
      import sqlc._
      
      case class Record(name: String, n: Int)
      
      val records = List(
        Record("three",   1),
        Record("three",   2),
        Record("two",     3),
        Record("three",   4),
        Record("two",     5))
      val recs = sc.parallelize(records)
      recs.registerTempTable("records")
      
      val grouped = recs.select('name, 'n).groupBy('name)(Count('n) as 'count)
      grouped.printSchema
      // root
      //  |-- count: long (nullable = false)
      
      grouped foreach println
      // [2]
      // [3]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            deanwampler Dean Wampler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: