Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
If a task runs more than `spark.dynamicAllocation.executorIdleTimeout`, the executor which runs this task will be killed.
The following steps (yarn-client mode) can reproduce this bug:
1. Start `spark-shell` using
./bin/spark-shell --conf "spark.shuffle.service.enabled=true" \ --conf "spark.dynamicAllocation.minExecutors=1" \ --conf "spark.dynamicAllocation.maxExecutors=4" \ --conf "spark.dynamicAllocation.enabled=true" \ --conf "spark.dynamicAllocation.executorIdleTimeout=30" \ --master yarn-client \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 1
2. Wait more than 30 seconds until there is only one executor.
3. Run the following code (a task needs at least 50 seconds to finish)
val r = sc.parallelize(1 to 1000, 20).map{t => Thread.sleep(1000); t}.groupBy(_ % 2).collect()
4. Executors will be killed and allocated all the time, which makes the Job fail.