[SPARK-18393] DataFrame pivot output column names should respect aliases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

For example

val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
df
  .groupBy('x)
  .pivot("a", Seq(0, 1))
  .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
  .show()
+---+--------------------+---------------------+--------------------+---------------------+
|  x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS `blah`|1_count(`b`) AS `foo`|
+---+--------------------+---------------------+--------------------+---------------------+
|  0|                 450|                   10|                 500|                   10|
|  1|                 510|                   10|                 460|                   10|
|  3|                 530|                   10|                 480|                   10|
|  2|                 470|                   10|                 520|                   10|
|  4|                 490|                   10|                 540|                   10|
+---+--------------------+---------------------+--------------------+---------------------+

The column names here are quite hard to read. Ideally we would respect the aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.

Attachments

Issue Links

duplicates

SPARK-17458 Alias specified for aggregates in a pivot are not honored

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Eric Liang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Nov/16 00:10

Updated:: 12/Dec/22 18:11

Resolved:: 18/Nov/16 10:32