[YARN-4685] Disable AM blacklisting by default to mitigate situations that application get hanged - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: resourcemanager
Labels:
None

Target Version/s:

2.8.0
Hadoop Flags:

Reviewed

Description

AM blacklist addition or removal is updated only when RMAppAttempt is scheduled i.e RMAppAttemptImpl#ScheduleTransition#transition. But once attempt is scheduled if there is any removeNode/addNode in cluster then this is not updated to BlackListManager#refreshNodeHostCount. This leads BlackListManager to operate on stale NM's count. And application is in ACCEPTED state and wait forever even if blacklisted nodes are reconnected with clearing disk space.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-4685.patch
19/Aug/16 07:41
1 kB
Rohith Sharma K S
YARN-4685-workaround.patch
17/Jun/16 05:13
0.9 kB
Rohith Sharma K S

Issue Links

is related to

YARN-2005 Blacklisting support for scheduling AMs

Resolved

relates to

YARN-4837 User facing aspects of 'AM blacklisting' feature need fixing

Resolved

YARN-4576 Enhancement for tracking Blacklist in AM Launching

Open

Activity

People

Assignee:: Rohith Sharma K S

Reporter:: Rohith Sharma K S

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 10/Feb/16 05:26

Updated:: 30/Aug/16 01:06

Resolved:: 20/Aug/16 00:09