Description
In SPARK-21603, we got performance regression if the HotSpot didn't compile too long functions (the limit is 80000 in bytecode size).
I checked and I found the codegen of `HashAggregateExec` frequently goes over the limit, for example:
spark.range(10000000).selectExpr("id % 1024 AS a", "id AS b").write.saveAsTable("t") sql("SELECT a, KURTOSIS(b)FROM t GROUP BY a")
This query goes over the limit and the actual bytecode size is `12356`.
So, it might be better to split the aggregation code into piecies.
Attachments
Issue Links
- Is contained by
-
SPARK-22600 Fix 64kb limit for deeply nested expressions under wholestage codegen
- Resolved
- is duplicated by
-
SPARK-23791 Sub-optimal generated code for sum aggregating
- Resolved
- is related to
-
SPARK-22105 Dataframe has poor performance when computing on many columns with codegen
- Resolved
- links to