Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
This is a step along the way to SPARK-8425 – see the design doc on that jira for a complete discussion of blacklisting.
To enable incremental review, the first step proposed here is to expand the blacklisting within tasksets. In particular, this will enable blacklisting for
- (task, executor) pairs (this already exists via an undocumented config)
- (task, node)
- (taskset, executor)
- (taskset, node)
In particular, adding (task, node) is critical to making spark fault-tolerant of one-bad disk in a cluster, without requiring careful tuning of "spark.task.maxFailures". The other additions are also important to avoid many misleading task failures and long scheduling delays when there is one bad node on a large cluster.
Attachments
Issue Links
- blocks
-
SPARK-8425 Add blacklist mechanism for task scheduling
- Resolved
- is duplicated by
-
SPARK-4681 Turn on executor level blacklisting by default
- Closed
- links to