[SPARK-4564] SchemaRDD.groupBy(groupingExprs)(aggregateExprs) doesn't return the groupingExprs as part of the output schema - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

Mac OSX, local mode, but should hold true for all environments

Description

In the following example, I would expect the "grouped" schema to contain two fields, the String name and the Long count, but it only contains the Long count.

// Assumes val sc = new SparkContext(...), e.g., in Spark Shell
import org.apache.spark.sql.{SQLContext, SchemaRDD}
import org.apache.spark.sql.catalyst.expressions._

val sqlc = new SQLContext(sc)
import sqlc._

case class Record(name: String, n: Int)

val records = List(
  Record("three",   1),
  Record("three",   2),
  Record("two",     3),
  Record("three",   4),
  Record("two",     5))
val recs = sc.parallelize(records)
recs.registerTempTable("records")

val grouped = recs.select('name, 'n).groupBy('name)(Count('n) as 'count)
grouped.printSchema
// root
//  |-- count: long (nullable = false)

grouped foreach println
// [2]
// [3]

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dean Wampler

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Nov/14 16:47

Updated:: 19/Dec/14 21:04

Resolved:: 19/Dec/14 21:04