[SPARK-17458] Alias specified for aggregates in a pivot are not honored - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.3, 2.1.0
Component/s: SQL
Labels:
None

Description

When using pivot and multiple aggregations we need to alias to avoid special characters, but alias does not help because

df.groupBy("C").pivot("A").agg(avg("D").as("COLD"), max("B").as("COLB")).show

C	bar_avg(`D`) AS `COLD`	bar_max(`B`) AS `COLB`	foo_avg(`D`) AS `COLD`	foo_max(`B`) AS `COLB`
small	5.5	two	2.3333333333333335	two
large	5.5	two	2.0	one

Expected Output

C	bar_COLD	bar_COLB	foo_COLD	foo_COLB
small	5.5	two	2.3333333333333335	two
large	5.5	two	2.0	one

One approach you can fix this issue is to change the class
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
and change the outputName method in

object ResolvePivot extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transform {

def outputName(value: Literal, aggregate: Expression): String = {
          val suffix = aggregate match {
             case n: NamedExpression => aggregate.asInstanceOf[NamedExpression].name
             case _ => aggregate.sql
           }
          if (singleAgg) value.toString else value + "_" + suffix
        }

Version : 2.0.0

def outputName(value: Literal, aggregate: Expression): String = {
          if (singleAgg) value.toString else value + "_" + aggregate.sql
        }

Attachments

Issue Links

is duplicated by

SPARK-18393 DataFrame pivot output column names should respect aliases

Resolved

links to

[Github] Pull Request #15111 (aray)

[Github] Pull Request #16565 (maropu)

Activity

People

Assignee:: Andrew Ray

Reporter:: Ravi Somepalli

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Sep/16 22:38

Updated:: 15/Jan/17 07:40

Resolved:: 15/Sep/16 19:49