Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23974

Do not allocate more containers as expected in dynamic allocation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.1.1
    • None
    • Spark Core
    • None

    Description

      Using Yarn with dynamic allocation enabled, spark does not allocate more containers when current containers(executors) number is less than the max executor num.

      For example, we only have 7 executors working, while our cluster is not busy, and I have set

      {{ spark.dynamicAllocation.maxExecutors = 600}}

      and the current jobs of the context are executed slowly.

       

      A live case with online logs:
      ```
      $ grep "Not adding executors because our current target total" spark-job-server.log.9 | tail
      [2018-04-12 16:07:19,070] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:20,071] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:21,072] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:22,073] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:23,074] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:24,075] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:25,076] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:26,077] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:27,078] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 16:07:28,079] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)

      $ grep "Not adding executors because our current target total" spark-job-server.log.9 | head
      [2018-04-12 13:52:18,067] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:19,071] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:20,072] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:21,073] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:22,074] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:23,075] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:24,076] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:25,077] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:26,078] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)
      [2018-04-12 13:52:27,079] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager] - Not adding executors because our current target total is already 600 (limit 600)

      $ grep "Not adding executors because our current target total" spark-job-server.log.9 | wc -l
      8111
      ```
      The logs mean that we are keeping the `numExecutorsTarget == maxNumExecutors == 600` without requesting new executors. And at that time, we only have 7 executors available for our users.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sadhen Darcy Shen
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: