Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16554

Spark should kill executors when they are blacklisted

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: Scheduler
    • Labels:
      None

      Description

      SPARK-8425 will allow blacklisting faulty executors and nodes. However, these blacklisted executors will continue to run. This is bad for a few reasons:

      (1) Even if there is faulty-hardware, if the cluster is under-utilized spark may be able to request another executor on a different node.

      (2) If there is a faulty-disk (the most common case of faulty-hardware), the cluster manager may be able to allocate another executor on the same node, if it can exclude the bad disk. (Yarn will do this with its disk-health checker.)

      With dynamic allocation, this may seem less critical, as a blacklisted executor will stop running new tasks and eventually get reclaimed. However, if there is cached data on those executors, they will not get killed till spark.dynamicAllocation.cachedExecutorIdleTimeout expires, which is (effectively) infinite by default.

      Users may not always want to kill bad executors, so this must be configurable to some extent. At a minimum, it should be possible to enable / disable it; perhaps the executor should be killed after it has been blacklisted a configurable N times.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jsoltren Jose Soltren
                Reporter:
                irashid Imran Rashid
                Shepherd:
                Imran Rashid
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: