[MAPREDUCE-6541] Exclude scheduled reducer memory when calculating available mapper slots from headroom to avoid deadlock - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.1
Fix Version/s: 2.8.0, 3.0.0-alpha2
Component/s: None
Labels:
None

Target Version/s:

2.7.4
Hadoop Flags:

Reviewed

Description

We saw a MR deadlock recently:

When NM restarted by framework without enable recovery, containers running on these nodes will be identified as "ABORTED", and MR AM will try to reschedule "ABORTED" mapper containers.
Since such lost mappers are "ABORTED" container, MR AM gives normal mapper priority (priority=20) to such mapper requests. If there's any pending reducer (priority=10) at the same time, mapper requests need to wait for reducer requests satisfied.
In our test, one mapper needs 700+ MB, reducer needs 1000+ MB, and RM available resource = mapper-request = (700+ MB), only one job was running in the system so scheduler cannot allocate more reducer containers AND MR-AM thinks there're enough headroom for mapper so reducer containers will not be preempted.

~~MAPREDUCE-6302~~ can solve most of the problems, but in the other hand, I think we may need to exclude scheduled reducers resource when calculating #available-mapper-slots from headroom. Which we can avoid excessive reducer preemption.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6541.01.patch
12/Nov/15 14:20
9 kB
Varun Saxena
MAPREDUCE-6541.02.patch
27/Oct/16 11:00
8 kB
Varun Saxena

Issue Links

relates to

MAPREDUCE-6513 MR job got hanged forever when one NM unstable for some time

Closed

Activity

People

Assignee:: Varun Saxena

Reporter:: Wangda Tan

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 06/Nov/15 21:47

Updated:: 06/Jan/17 08:09

Resolved:: 27/Oct/16 12:43