[MAPREDUCE-6513] MR job got hanged forever when one NM unstable for some time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.7.0
Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
Component/s: applicationmaster, resourcemanager
Labels:
None

Target Version/s:

2.7.3
Hadoop Flags:

Reviewed

Description

when job is in-progress which is having more tasks,one node became unstable due to some OS issue.After the node became unstable, the map on this node status changed to KILLED state.

Currently maps which were running on unstable node are rescheduled, and all are in scheduled state and wait for RM assign container.Seen ask requests for map till Node is good (all those failed), there are no ask request after this. But AM keeps on preempting the reducers (it's recycling).

Finally reducers are waiting for complete mappers and mappers did n't get container..

My Question Is:
============
why map requests did not sent AM ,once after node recovery.?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6513.01.patch
01/Nov/15 12:01
37 kB
Varun Saxena
MAPREDUCE-6513.02.patch
08/Apr/16 09:46
34 kB
Varun Saxena
MAPREDUCE-6513.03.patch
08/Apr/16 11:22
34 kB
Varun Saxena
MAPREDUCE-6513.3_1.branch-2.7.patch
19/Apr/16 05:49
38 kB
Wangda Tan
MAPREDUCE-6513.3_1.branch-2.8.patch
14/Apr/16 22:31
34 kB
Wangda Tan
MAPREDUCE-6513.3.branch-2.8.patch
14/Apr/16 20:12
34 kB
Wangda Tan

Issue Links

is related to

MAPREDUCE-6541 Exclude scheduled reducer memory when calculating available mapper slots from headroom to avoid deadlock

Resolved

relates to

MAPREDUCE-5507 MapReduce reducer ramp down is suboptimal with potential job-hanging issues

Open

MAPREDUCE-6514 Job hangs as ask is not updated after ramping down of all reducers

Closed

MAPREDUCE-6302 Preempt reducers after a configurable timeout irrespective of headroom

Closed

Activity

People

Assignee:: Varun Saxena

Reporter:: Bob.zhao

Votes:: 0 Vote for this issue

Watchers:: 30 Start watching this issue

Dates

Due:: 15/Oct/15

Created:: 15/Oct/15 15:32

Updated:: 30/Aug/16 01:15

Resolved:: 13/May/16 21:56