Description
When using pivot and multiple aggregations we need to alias to avoid special characters, but alias does not help because
df.groupBy("C").pivot("A").agg(avg("D").as("COLD"), max("B").as("COLB")).show
C | bar_avg(`D`) AS `COLD` | bar_max(`B`) AS `COLB` | foo_avg(`D`) AS `COLD` | foo_max(`B`) AS `COLB` |
---|---|---|---|---|
small | 5.5 | two | 2.3333333333333335 | two |
large | 5.5 | two | 2.0 | one |
Expected Output
C | bar_COLD | bar_COLB | foo_COLD | foo_COLB |
---|---|---|---|---|
small | 5.5 | two | 2.3333333333333335 | two |
large | 5.5 | two | 2.0 | one |
One approach you can fix this issue is to change the class
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
and change the outputName method in
object ResolvePivot extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
def outputName(value: Literal, aggregate: Expression): String = { val suffix = aggregate match { case n: NamedExpression => aggregate.asInstanceOf[NamedExpression].name case _ => aggregate.sql } if (singleAgg) value.toString else value + "_" + suffix }
Version : 2.0.0
def outputName(value: Literal, aggregate: Expression): String = { if (singleAgg) value.toString else value + "_" + aggregate.sql }
Attachments
Issue Links
- is duplicated by
-
SPARK-18393 DataFrame pivot output column names should respect aliases
- Resolved
- links to