Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
MapStatus saves the size of each block (1 byte per block) for a particular map task. This actually means the shuffle metadata is O(M*R), where M = num maps and R = num reduces.
If M is greater than a certain size, we should probably just send an average size instead of a whole array.
Attachments
Issue Links
- is duplicated by
-
SPARK-4909 "Error communicating with MapOutputTracker" when run a big spark job
- Resolved
- is related to
-
SPARK-3740 Use a compressed bitmap to track zero sized blocks in HighlyCompressedMapStatus
- Resolved
- links to