[SPARK-20184] performance regression for complex/long sql when enable whole stage codegen - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.6.0, 2.1.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

The performance of following SQL get much worse in spark 2.x in contrast with codegen off.

SELECT
sum(COUNTER_57)
,sum(COUNTER_71)
,sum(COUNTER_3)
,sum(COUNTER_70)
,sum(COUNTER_66)
,sum(COUNTER_75)
,sum(COUNTER_69)
,sum(COUNTER_55)
,sum(COUNTER_63)
,sum(COUNTER_68)
,sum(COUNTER_56)
,sum(COUNTER_37)
,sum(COUNTER_51)
,sum(COUNTER_42)
,sum(COUNTER_43)
,sum(COUNTER_1)
,sum(COUNTER_76)
,sum(COUNTER_54)
,sum(COUNTER_44)
,sum(COUNTER_46)
,DIM_1
,DIM_2
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;

Num of rows of aggtable is about 35000000.

whole stage codegen on(spark.sql.codegen.wholeStage = true): 40s
whole stage codegen off(spark.sql.codegen.wholeStage = false): 6s

After some analysis i think this is related to the huge java method(a java method of thousand lines) which generated by codegen.
And If i config -XX:-DontCompileHugeMethods the performance get much better(about 7s).

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Fei Wang

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 01/Apr/17 06:05

Updated:: 25/May/21 01:50

Resolved:: 25/May/21 01:43