Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
3.2.1
-
None
-
None
-
hadoop: 3.0.0
spark: 2.4.0 / 3.2.1
shuffle:spark 2.4.0
Description
spark.sql(
s"""
|SELECT
| Info ,
| PERCENTILE_APPROX(cost,0.5) cost_p50,
| PERCENTILE_APPROX(cost,0.9) cost_p90,
| PERCENTILE_APPROX(cost,0.95) cost_p95,
| PERCENTILE_APPROX(cost,0.99) cost_p99,
| PERCENTILE_APPROX(cost,0.999) cost_p999
|FROM
| textData
|""".stripMargin)
- When we used spark 2.4.0, aggregation adopted objHashAggregator, stage 2 pull shuffle data very quick . but , when we use spark 3.2.1 and use old shuffle , 140M shuffle data cost 3 hours.
- If we upgrade the Shuffle, will we get performance regression?
Attachments
Attachments
Issue Links
- is related to
-
SPARK-46706 percentile_approx regression since Spark 2.4
- Open