[MAPREDUCE-6489] Fail fast rogue tasks that write too much to local disk - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.1
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: task
Labels:
None

Target Version/s:

2.7.1
Hadoop Flags:

Reviewed

Description

Tasks of the rogue jobs can write too much to local disk, negatively affecting the jobs running in collocated containers. Ideally YARN will be able to limit amount of local disk used by each task: ~~YARN-4011~~. Until then, the mapreduce task can fail fast if the task is writing too much (above a configured threshold) to local disk.

As we discussed here the suggested approach is that the MapReduce task checks for BYTES_WRITTEN counter for the local disk and throws an exception when it goes beyond a configured value. It is true that written bytes is larger than the actual used disk space, but to detect a rogue task the exact value is not required and a very large value for written bytes to local disk is a good indicative that the task is misbehaving.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6489.001.patch
05/Oct/15 01:37
9 kB
Maysam Yabandeh
MAPREDUCE-6489.002.patch
05/Oct/15 05:52
10 kB
Maysam Yabandeh
MAPREDUCE-6489.003.patch
11/Oct/15 21:20
12 kB
Maysam Yabandeh
MAPREDUCE-6489-branch-2.003.patch
21/Oct/15 00:24
12 kB
Maysam Yabandeh

Issue Links

is related to

TEZ-3821 Ability to fail fast tasks that write too much to local disk

Resolved

MAPREDUCE-7022 Fast fail rogue jobs based on task scratch dir size

Resolved

Activity

People

Assignee:: Maysam Yabandeh

Reporter:: Maysam Yabandeh

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 23/Sep/15 15:45

Updated:: 08/Dec/17 01:30

Resolved:: 21/Oct/15 14:12