Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.0
-
None
Description
ReadAheadInputStream was introduced in apache/spark#18317 to optimize reading spill files from disk.
However, investigating flamegraphs of profiles from investigating some regressed workloads after switch to Spark 2.3, it seems that the hot path of reading small amounts of data (like readInt) is inefficient - it involves taking locks, and multiple checks.
Attachments
Issue Links
- is a child of
-
SPARK-23310 Perf regression introduced by SPARK-21113
- Resolved
- links to