[YARN-2005] Blacklisting support for scheduling AMs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.23.10, 2.4.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: resourcemanager
Labels:
None

Hadoop Flags:

Reviewed

Description

It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-2005.001.patch
25/Jun/15 21:43
40 kB
Anubhav Dhoot
YARN-2005.002.patch
26/Jun/15 18:10
40 kB
Anubhav Dhoot
YARN-2005.003.patch
01/Jul/15 05:47
41 kB
Anubhav Dhoot
YARN-2005.004.patch
15/Jul/15 23:09
46 kB
Anubhav Dhoot
YARN-2005.005.patch
18/Aug/15 05:49
48 kB
Anubhav Dhoot
YARN-2005.006.patch
18/Aug/15 21:49
54 kB
Anubhav Dhoot
YARN-2005.006.patch
19/Aug/15 17:21
54 kB
Anubhav Dhoot
YARN-2005.007.patch
02/Sep/15 17:48
63 kB
Anubhav Dhoot
YARN-2005.008.patch
03/Sep/15 01:21
67 kB
Anubhav Dhoot
YARN-2005.009.patch
11/Sep/15 00:44
64 kB
Anubhav Dhoot

Issue Links

duplicates

MAPREDUCE-6511 MRAppMaster second attempt starting on the same node as a previously failed MRAppMaster attempt

Resolved

YARN-2293 Scoring for NMs to identify a better candidate to launch AMs

Resolved

YARN-3744 ResourceManager should avoid allocating AM to same node repeatedly in case of AM launch failures

Resolved

is depended upon by

YARN-896 Roll up for long-lived services in YARN

Open

is duplicated by

YARN-4217 Failed AM attempt retries on same failed host

Resolved

YARN-8352 AM should retry on a different node after the previous application attempt fail

Resolved

is related to

YARN-3994 RM should respect AM resource/placement constraints

Open

YARN-4837 User facing aspects of 'AM blacklisting' feature need fixing

Resolved

YARN-3803 Application hangs after more then one localization attempt fails on the same NM

Resolved

YARN-4576 Enhancement for tracking Blacklist in AM Launching

Open

YARN-1073 NM to recognise when it can't spawn process and stop accepting containers

Open

relates to

YARN-4247 Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

Resolved

YARN-4685 Disable AM blacklisting by default to mitigate situations that application get hanged

Resolved

YARN-4284 condition for AM blacklisting is too narrow

Resolved

YARN-4670 add logging when a node is AM-blacklisted

Open

YARN-964 Give a parameter that can set AM retry interval

Resolved

(1 is duplicated by, 5 is related to, 5 relates to)

Activity

People

Assignee:: Anubhav Dhoot

Reporter:: Jason Darrell Lowe

Votes:: 3 Vote for this issue

Watchers:: 38 Start watching this issue

Dates

Created:: 30/Apr/14 14:24

Updated:: 26/Feb/19 02:57

Resolved:: 14/Sep/15 00:10