[SPARK-3613] Don't record the size of each shuffle block for large jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: Shuffle, Spark Core
Labels:
None

Target Version/s:

1.2.0

Description

MapStatus saves the size of each block (1 byte per block) for a particular map task. This actually means the shuffle metadata is O(M*R), where M = num maps and R = num reduces.

If M is greater than a certain size, we should probably just send an average size instead of a whole array.

Attachments

Issue Links

is duplicated by

SPARK-4909 "Error communicating with MapOutputTracker" when run a big spark job

Resolved

is related to

SPARK-3740 Use a compressed bitmap to track zero sized blocks in HighlyCompressedMapStatus

Resolved

links to

[Github] Pull Request #2470 (rxin)

Activity

People

Assignee:: Reynold Xin

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Sep/14 06:53

Updated:: 25/Dec/14 15:21

Resolved:: 30/Sep/14 05:56