[SPARK-23310] Perf regression introduced by SPARK-21113 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Done
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: Spark Core
Labels:
None

Target Version/s:

2.3.0

Description

While running all TPC-DS queries with SF set to 1000, we noticed that Q95 (https://github.com/databricks/spark-sql-perf/blob/master/src/main/resources/tpcds_2_4/q95.sql) has noticeable regression (11%). After looking into it, we found that the regression was introduced by ~~SPARK-21113~~. Specially, ReadAheadInputStream gets lock congestion. After setting spark.unsafe.sorter.spill.read.ahead.enabled set to false, the regression disappear and the overall performance of all TPC-DS queries has improved.

I am proposing that we set spark.unsafe.sorter.spill.read.ahead.enabled to false by default for Spark 2.3 and re-enable it after addressing the lock congestion issue.

Attachments

Issue Links

is a parent of

SPARK-23366 Improve hot reading path in ReadAheadInputStream

Resolved

is caused by

SPARK-21113 Support for read ahead input stream to amortize disk IO cost in the Spill reader

Resolved

links to

[Github] Pull Request #20492 (sitalkedia)

[Github] Pull Request #20514 (ueshin)

Activity

People

Assignee:: Sital Kedia

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 02/Feb/18 02:10

Updated:: 12/Feb/18 18:35

Resolved:: 05/Feb/18 18:21