Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.0, 2.3.1, 2.3.2, 2.4.0
-
None
-
None
Description
In our test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is L486 and L487 . Q19 costs about 30 seconds without these two lines code and 126 seconds with these code.
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-26316 Because of the perf degradation in TPC-DS, we currently partial revert SPARK-21052:Add hash map metrics to join,
- Resolved
- links to