[SPARK-35480] percentile_approx function doesn't work with pivot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.1
Fix Version/s: 3.2.0
Component/s: PySpark, SQL
Labels:
None

Description

The percentile_approx PySpark function does not appear to treat the "accuracy" parameter correctly when pivoting on a column, causing the query below to fail (this also fails if the accuracy parameter is left unspecified):

import pyspark.sql.functions as F

df = sc.parallelize([
["a", -1.0],
["a", 5.5],
["a", 2.5],
["b", 3.0],
["b", 5.2]
]).toDF(["type", "value"])
.groupBy()
.pivot("type", ["a", "b"])
.agg(F.percentile_approx("value", [0.5], 10000).alias("percentiles"))

Error message:

AnalysisException: cannot resolve 'percentile_approx((IF((`type` <=> CAST('a' AS STRING)), `value`, CAST(NULL AS DOUBLE))), (IF((`type` <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((`type` <=> CAST('a' AS STRING)), 10000, CAST(NULL AS INT))))' due to data type mismatch: The accuracy or percentage provided must be a constant literal; 'Aggregate percentile_approx(if ((type#242 <=> cast(a as string))) value#243 else cast(null as double), if ((type#242 <=> cast(a as string))) array(0.5) else cast(null as array<double>), if ((type#242 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS a#251, percentile_approx(if ((type#242 <=> cast(b as string))) value#243 else cast(null as double), if ((type#242 <=> cast(b as string))) array(0.5) else cast(null as array<double>), if ((type#242 <=> cast(b as string))) 10000 else cast(null as int), 0, 0) AS b#253 +- LogicalRDD type#242, value#243, false

Attachments

Issue Links

links to

[Github] Pull Request #32619 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Christopher Bryant

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/May/21 04:17

Updated:: 12/Dec/22 18:11

Resolved:: 22/May/21 22:36