Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4951

A busy executor may be killed when dynamicAllocation is enabled

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.1, 1.3.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      If a task runs more than `spark.dynamicAllocation.executorIdleTimeout`, the executor which runs this task will be killed.

      The following steps (yarn-client mode) can reproduce this bug:
      1. Start `spark-shell` using

      ./bin/spark-shell --conf "spark.shuffle.service.enabled=true" \
          --conf "spark.dynamicAllocation.minExecutors=1" \
          --conf "spark.dynamicAllocation.maxExecutors=4" \
          --conf "spark.dynamicAllocation.enabled=true" \
          --conf "spark.dynamicAllocation.executorIdleTimeout=30" \
          --master yarn-client \
          --driver-memory 512m \
          --executor-memory 512m \
          --executor-cores 1
      

      2. Wait more than 30 seconds until there is only one executor.
      3. Run the following code (a task needs at least 50 seconds to finish)

      val r = sc.parallelize(1 to 1000, 20).map{t => Thread.sleep(1000); t}.groupBy(_ % 2).collect()
      

      4. Executors will be killed and allocated all the time, which makes the Job fail.

        Attachments

          Activity

            People

            • Assignee:
              zsxwing Shixiong Zhu
              Reporter:
              zsxwing Shixiong Zhu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: