[SPARK-2773] Shuffle：use growth rate to predict if need to spill - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Invalid
Affects Version/s: 0.9.0, 1.0.0
Fix Version/s: None
Component/s: Shuffle, Spark Core
Labels:
None

Target Version/s:

1.0.3

Description

Right now, Spark uses the total usage of "shuffle" memory of each thread to predict if need to spill. I think it is not very reasonable. For example, there are two threads pulling "shuffle" data. The total memory used to buffer data is 21G. The first time to trigger spilling it when one thread has used 7G memory to buffer "shuffle" data, here I assume another one has used the same size. Unfortunately, I still have remaining 7G to use. So, I think current prediction mode is too conservative, and can not maximize the usage of "shuffle" memory. In my solution, I use the growth rate of "shuffle" memory. Again, the growth of each time is limited, maybe 10K * 1024(my assumption), then the first time to trigger spilling is when the remaining "shuffle" memory is less than threads * growth * 2, i.e. 2 * 10M * 2. I think it can maximize the usage of "shuffle" memory. In my solution, there is also a conservative assumption, i.e. all of threads is pulling shuffle data in one executor. However it dose not have much effect, the grow is limited after all. Any suggestion?

Attachments

Issue Links

links to

[Github] Pull Request #1696 (uncleGen)

Activity

People

Assignee:: Unassigned

Reporter:: Genmao Yu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Jul/14 17:26

Updated:: 17/May/20 18:30

Resolved:: 02/Sep/14 02:05