[SPARK-16554] Spark should kill executors when they are blacklisted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: Scheduler, Spark Core
Labels:
None

Description

~~SPARK-8425~~ will allow blacklisting faulty executors and nodes. However, these blacklisted executors will continue to run. This is bad for a few reasons:

(1) Even if there is faulty-hardware, if the cluster is under-utilized spark may be able to request another executor on a different node.

(2) If there is a faulty-disk (the most common case of faulty-hardware), the cluster manager may be able to allocate another executor on the same node, if it can exclude the bad disk. (Yarn will do this with its disk-health checker.)

With dynamic allocation, this may seem less critical, as a blacklisted executor will stop running new tasks and eventually get reclaimed. However, if there is cached data on those executors, they will not get killed till spark.dynamicAllocation.cachedExecutorIdleTimeout expires, which is (effectively) infinite by default.

Users may not always want to kill bad executors, so this must be configurable to some extent. At a minimum, it should be possible to enable / disable it; perhaps the executor should be killed after it has been blacklisted a configurable N times.

Attachments

Issue Links

is blocked by

SPARK-16654 UI Should show blacklisted executors & nodes

Resolved

SPARK-8425 Add blacklist mechanism for task scheduling

Resolved

is related to

SPARK-15815 Hang while enable blacklistExecutor and DynamicExecutorAllocator

Resolved

links to

[Github] Pull Request #16650 (jsoltren)

Activity

People

Assignee:: Jose Soltren

Reporter:: Imran Rashid

Shepherd:: Imran Rashid

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Jul/16 21:01

Updated:: 17/May/20 17:48

Resolved:: 09/Feb/17 18:49