Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.1.0
-
None
-
None
-
Mac OSX, local mode, but should hold true for all environments
Description
In the following example, I would expect the "grouped" schema to contain two fields, the String name and the Long count, but it only contains the Long count.
// Assumes val sc = new SparkContext(...), e.g., in Spark Shell import org.apache.spark.sql.{SQLContext, SchemaRDD} import org.apache.spark.sql.catalyst.expressions._ val sqlc = new SQLContext(sc) import sqlc._ case class Record(name: String, n: Int) val records = List( Record("three", 1), Record("three", 2), Record("two", 3), Record("three", 4), Record("two", 5)) val recs = sc.parallelize(records) recs.registerTempTable("records") val grouped = recs.select('name, 'n).groupBy('name)(Count('n) as 'count) grouped.printSchema // root // |-- count: long (nullable = false) grouped foreach println // [2] // [3]