[SPARK-19244] Sort MemoryConsumers according to their memory usage when spilling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: Spark Core
Labels:
None

Description

In `TaskMemoryManager `, when we acquire memory by calling `acquireExecutionMemory` and we can't acquire required memory, we will try to spill other memory consumers.

Currently, we simply iterates the memory consumers in a hash set. Normally each time the consumer will be iterated in the same order.

The first issue is that we might spill additional consumers. For example, if consumer 1 uses 10MB, consumer 2 uses 50MB, then consumer 3 acquires 100MB but we can only get 60MB and spilling is needed. We might spill both consumer 1 and consumer 2. But we actually just need to spill consumer 2 and get the required 100MB.

The second issue is that if we spill consumer 1 in first time spilling. After a while, consumer 1 now uses 5MB. Then consumer 4 may acquire some memory and spilling is needed again. Because we iterate the memory consumers in the same order, we will spill consumer 1 again. So for consumer 1, we will produce many small spilling files.

Attachments

Issue Links

links to

[Github] Pull Request #16603 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: L. C. Hsieh

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Jan/17 13:31

Updated:: 03/Feb/17 14:18

Resolved:: 03/Feb/17 14:15