[MAPREDUCE-5617] map task is not re-launched when the task is failed while reducers are running with full cluster capacity - which will lead to job hang - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Invalid
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

SuSe Linux

Description

In a Cluster with 16GB capacity, job has started with 100maps and 10 reducers.

When the reducers has started its execution, one NM has went down and resulted a failure for 2 maps. But at this time, remaining 8Gb was used by 6 reducers and AM. So there was no place to launch the failed maps. [NM never came up again, and cluster size became 8GB]

If we kill one of reducers, then also the map cannot be launched as the priority of Failed map is lesser than that of reducer. So the remaining reducer only will get allocated from RM side.

This is causing a hang for in reducer side.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sunil G

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 11/Nov/13 06:28

Updated:: 09/May/15 02:17

Resolved:: 09/May/15 02:17