[SPARK-21870] Split codegen'd aggregation code into small functions for the HotSpot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

In ~~SPARK-21603~~, we got performance regression if the HotSpot didn't compile too long functions (the limit is 80000 in bytecode size).
I checked and I found the codegen of `HashAggregateExec` frequently goes over the limit, for example:

spark.range(10000000).selectExpr("id % 1024 AS a", "id AS b").write.saveAsTable("t")
sql("SELECT a, KURTOSIS(b)FROM t GROUP BY a")

This query goes over the limit and the actual bytecode size is `12356`.
So, it might be better to split the aggregation code into piecies.

Attachments

Issue Links

Is contained by

SPARK-22600 Fix 64kb limit for deeply nested expressions under wholestage codegen

Resolved

is duplicated by

SPARK-23791 Sub-optimal generated code for sum aggregating

Resolved

is related to

SPARK-22105 Dataframe has poor performance when computing on many columns with codegen

Resolved

links to

[Github] Pull Request #19082 (maropu)

[Github] Pull Request #20965 (maropu)

GitHub Pull Request #19082

GitHub Pull Request #20965

GitHub Pull Request #25714

(3 links to)

Activity

People

Assignee:: Takeshi Yamamuro

Reporter:: Takeshi Yamamuro

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 30/Aug/17 00:45

Updated:: 06/Sep/19 22:32

Resolved:: 06/Sep/19 03:48