Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17675

Add Blacklisting of Executors & Nodes within one TaskSet

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • Scheduler, Spark Core
    • None

    Description

      This is a step along the way to SPARK-8425 – see the design doc on that jira for a complete discussion of blacklisting.

      To enable incremental review, the first step proposed here is to expand the blacklisting within tasksets. In particular, this will enable blacklisting for

      • (task, executor) pairs (this already exists via an undocumented config)
      • (task, node)
      • (taskset, executor)
      • (taskset, node)

      In particular, adding (task, node) is critical to making spark fault-tolerant of one-bad disk in a cluster, without requiring careful tuning of "spark.task.maxFailures". The other additions are also important to avoid many misleading task failures and long scheduling delays when there is one bad node on a large cluster.

      Attachments

        Issue Links

          Activity

            People

              irashid Imran Rashid
              irashid Imran Rashid
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: