[MAPREDUCE-5817] Mappers get rescheduled on node transition even after all reducers are completed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
Component/s: applicationmaster
Labels:
None

Target Version/s:

2.8.0, 2.6.5
Hadoop Flags:

Reviewed

Description

We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed!

This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases.

Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory.

If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem.

If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

mapreduce-5817.patch
01/Apr/14 18:40
13 kB
Sangjin Lee
MAPREDUCE-5817.001.patch
11/Aug/15 20:42
13 kB
Sangjin Lee
MAPREDUCE-5817.002.patch
14/Aug/15 17:56
12 kB
Sangjin Lee

Issue Links

is related to

YARN-1996 Provide alternative policies for UNHEALTHY nodes.

Open

MAPREDUCE-7109 On completion of shuffle phase in reducers, mappers should not be launched again

Open

MAPREDUCE-6870 Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

Resolved

Activity

People

Assignee:: Sangjin Lee

Reporter:: Sangjin Lee

Votes:: 0 Vote for this issue

Watchers:: 21 Start watching this issue

Dates

Created:: 31/Mar/14 18:48

Updated:: 11/Jun/18 06:37

Resolved:: 14/Aug/15 19:43