Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17675

Add Blacklisting of Executors & Nodes within one TaskSet

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: Scheduler
    • Labels:
      None

      Description

      This is a step along the way to SPARK-8425 – see the design doc on that jira for a complete discussion of blacklisting.

      To enable incremental review, the first step proposed here is to expand the blacklisting within tasksets. In particular, this will enable blacklisting for

      • (task, executor) pairs (this already exists via an undocumented config)
      • (task, node)
      • (taskset, executor)
      • (taskset, node)

      In particular, adding (task, node) is critical to making spark fault-tolerant of one-bad disk in a cluster, without requiring careful tuning of "spark.task.maxFailures". The other additions are also important to avoid many misleading task failures and long scheduling delays when there is one bad node on a large cluster.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                irashid Imran Rashid
                Reporter:
                irashid Imran Rashid
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: