Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
HighlyCompressedMapStatus uses a single long to track the average block size. However, if a stage has a lot of zero sized outputs, this leads to inefficiency because executors would need to send requests to fetch zero sized blocks.
We can use a compressed bitmap to track the zero-sized blocks.
See discussion in https://github.com/apache/spark/pull/2470
Attachments
Issue Links
- relates to
-
SPARK-4019 Shuffling with more than 2000 reducers may drop all data when partitons are mostly empty or cause deserialization errors if at least one partition is empty
- Resolved
-
SPARK-3613 Don't record the size of each shuffle block for large jobs
- Resolved
- links to