[SPARK-23366] Improve hot reading path in ReadAheadInputStream - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: Spark Core
Labels:
None

Description

ReadAheadInputStream was introduced in apache/spark#18317 to optimize reading spill files from disk.
However, investigating flamegraphs of profiles from investigating some regressed workloads after switch to Spark 2.3, it seems that the hot path of reading small amounts of data (like readInt) is inefficient - it involves taking locks, and multiple checks.

Attachments

Issue Links

is a child of

SPARK-23310 Perf regression introduced by SPARK-21113

Resolved

links to

[Github] Pull Request #20555 (juliuszsompolski)

Activity

People

Assignee:: Juliusz Sompolski

Reporter:: Juliusz Sompolski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Feb/18 03:47

Updated:: 15/Feb/18 09:09

Resolved:: 15/Feb/18 09:09