[MAPREDUCE-7022] Fast fail rogue jobs based on task scratch dir size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.0, 2.8.0, 2.9.0
Fix Version/s: 3.1.0
Component/s: task
Labels:
None

Target Version/s:

2.9.1
Hadoop Flags:

Reviewed

Description

With the introduction of ~~MAPREDUCE-6489~~ there are some options to kill rogue tasks based on writes to local disk writes. In our environment are we mainly run Hive based jobs we noticed that this counter and the size of the local scratch dirs were very different. We had tasks where BYTES_WRITTEN counter were at 300Gb and where it was at 10Tb both producing around 200Gb on local disk, so it didn't help us much. So to extend this feature tasks should monitor local scratchdir size and fail if they pass the limit. In these cases the tasks should not be retried either but instead the job should fast fail.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-7022.009.patch
19/Jan/18 09:02
53 kB
Johan Gustavsson
MAPREDUCE-7022.008.patch
15/Jan/18 08:53
52 kB
Johan Gustavsson
MAPREDUCE-7022.007.patch
15/Jan/18 05:00
51 kB
Johan Gustavsson
MAPREDUCE-7022.006.patch
04/Jan/18 05:38
46 kB
Johan Gustavsson
MAPREDUCE-7022.005.patch
22/Dec/17 07:03
43 kB
Johan Gustavsson
MAPREDUCE-7022.004.patch
20/Dec/17 03:39
28 kB
Johan Gustavsson
MAPREDUCE-7022.003.patch
20/Dec/17 03:00
11 kB
Johan Gustavsson
MAPREDUCE-7022.002.patch
08/Dec/17 08:42
30 kB
Johan Gustavsson
MAPREDUCE-7022.001.patch
08/Dec/17 00:59
28 kB
Johan Gustavsson

Issue Links

relates to

MAPREDUCE-6489 Fail fast rogue tasks that write too much to local disk

Resolved

Activity

People

Assignee:: Johan Gustavsson

Reporter:: Johan Gustavsson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Dec/17 00:56

Updated:: 29/Jan/18 01:01

Resolved:: 26/Jan/18 20:44