[YARN-4284] condition for AM blacklisting is too narrow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: resourcemanager
Labels:
None

Hadoop Flags:

Reviewed

Description

Per ~~YARN-2005~~, there is now a way to blacklist nodes for AM purposes so the next app attempt can be assigned to a different node.

However, currently the condition under which the node gets blacklisted is limited to DISKS_FAILED. There are a whole host of other issues that may cause the failure, for which we want to locate the AM elsewhere; e.g. disks full, JVM crashes, memory issues, etc.

Since the AM blacklisting is per-app, there is little practical downside in blacklisting the nodes on any failure (although it might lead to blacklisting the node more aggressively than necessary). I would propose locating the next app attempt to a different node on any failure.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-4284.001.patch
21/Oct/15 04:46
2 kB
Sangjin Lee
YARN-4284.002.patch
23/Oct/15 23:54
9 kB
Sangjin Lee

Issue Links

is related to

YARN-4576 Enhancement for tracking Blacklist in AM Launching

Open

YARN-2005 Blacklisting support for scheduling AMs

Resolved

Activity

People

Assignee:: Sangjin Lee

Reporter:: Sangjin Lee

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 21/Oct/15 04:22

Updated:: 30/Aug/16 01:15

Resolved:: 26/Oct/15 19:56