Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41550 Dynamic Allocation on K8S GA
  3. SPARK-38019

ExecutorMonitor.timedOutExecutors should be deterministic

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0, 3.2.1, 3.3.0
    • 3.3.0, 3.2.2
    • Spark Core
    • None

    Description

      Since the AS-IS timedOutExecutors returns the result indeterministic, it kills the executors in a random order at Dynamic Allocation setting.

      spark/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala

       private val executors = new ConcurrentHashMap[String, Tracker]() 
      ...
       timedOutExecs = executors.asScala 
      

      This random behavior not only makes the users confusing but also causes a K8s decommission tests flaky like the following case in Java 17 on Apple Silicon environment. The K8s test expects the decommission of executor 1 while the executor 2 is chosen at this time.

      22/01/25 06:11:16 DEBUG ExecutorMonitor: Executors 1,2 do not have active shuffle data after job 0 finished.
      22/01/25 06:11:16 DEBUG ExecutorAllocationManager: max needed for rpId: 0 numpending: 0, tasksperexecutor: 1
      22/01/25 06:11:16 DEBUG ExecutorAllocationManager: No change in number of executors
      22/01/25 06:11:16 DEBUG ExecutorAllocationManager: Request to remove executorIds: (2,0), (1,0)
      22/01/25 06:11:16 DEBUG ExecutorAllocationManager: Not removing idle executor 1 because there are only 1 executor(s) left (minimum number of executor limit 1)
      22/01/25 06:11:16 INFO KubernetesClusterSchedulerBackend: Decommission executors: 2
      

      Attachments

        Activity

          People

            dongjoon Dongjoon Hyun
            dongjoon Dongjoon Hyun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: