Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.2, 2.1.0, 2.2.0
-
None
Description
We are seeing some strange behavior with dynamic allocation, where in some cases the driver will get into a state where it constantly kills idle executors while requesting new executors. This happens at the end of a stage when all tasks are assigned and never stops even when there are no tasks to run.
From the YarnAllocator logs, it looks like the allocator is getting lots of requests from the driver, even though the timeout between requests should be 5s:
17/04/20 19:52:05 INFO dispatcher-event-loop-49 YarnAllocator: Driver requested a total number of 227 executor(s). 17/04/20 19:52:05 INFO dispatcher-event-loop-30 YarnAllocator: Driver requested a total number of 213 executor(s). 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 1 executor containers, each with 2 cores and 7168 MB memory including 2048 MB overhead 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality no longer needed) 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: <memory:7168, vCores:2>) spark://CoarseGrainedScheduler@100.74.39.143:10895, executorHostname: ip-100-74-34-230.ec2.internal spark://CoarseGrainedScheduler@100.74.39.143:10895, executorHostname: ip-100-74-47-57.ec2.internal 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them. 17/04/20 19:52:05 INFO dispatcher-event-loop-11 YarnAllocator: Driver requested a total number of 195 executor(s). 17/04/20 19:52:05 INFO dispatcher-event-loop-55 YarnAllocator: Driver requested a total number of 174 executor(s). 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 2 executor containers, each with 2 cores and 7168 MB memory including 2048 MB overhead 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality no longer needed) 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: <memory:7168, vCores:2>) 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any, capability: <memory:7168, vCores:2>) 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 4 containers from YARN, launching executors on 4 of them.
I think the allocator cancels what requests it can, but is getting containers that have already been requested and the executors keep growing because of requests from the driver. Here are 5 seconds from the log:
17/04/20 19:52:30 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total number of 185 executor(s). 17/04/20 19:52:30 INFO dispatcher-event-loop-48 YarnAllocator: Driver requested a total number of 193 executor(s). 17/04/20 19:52:30 INFO dispatcher-event-loop-24 YarnAllocator: Driver requested a total number of 192 executor(s). 17/04/20 19:52:30 INFO dispatcher-event-loop-60 YarnAllocator: Driver requested a total number of 195 executor(s). 17/04/20 19:52:30 INFO dispatcher-event-loop-53 YarnAllocator: Driver requested a total number of 205 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total number of 202 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-17 YarnAllocator: Driver requested a total number of 232 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-45 YarnAllocator: Driver requested a total number of 243 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total number of 254 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-42 YarnAllocator: Driver requested a total number of 263 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-20 YarnAllocator: Driver requested a total number of 271 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 280 executor(s). 17/04/20 19:52:31 INFO dispatcher-event-loop-61 YarnAllocator: Driver requested a total number of 289 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total number of 305 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total number of 310 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-0 YarnAllocator: Driver requested a total number of 313 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total number of 315 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total number of 316 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-13 YarnAllocator: Driver requested a total number of 317 executor(s). 17/04/20 19:52:32 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 311 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total number of 308 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-4 YarnAllocator: Driver requested a total number of 301 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-23 YarnAllocator: Driver requested a total number of 294 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-46 YarnAllocator: Driver requested a total number of 287 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-8 YarnAllocator: Driver requested a total number of 285 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total number of 283 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total number of 281 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total number of 278 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-3 YarnAllocator: Driver requested a total number of 277 executor(s). 17/04/20 19:52:33 INFO dispatcher-event-loop-38 YarnAllocator: Driver requested a total number of 276 executor(s). 17/04/20 19:52:34 INFO dispatcher-event-loop-51 YarnAllocator: Driver requested a total number of 273 executor(s). 17/04/20 19:52:34 INFO dispatcher-event-loop-31 YarnAllocator: Driver requested a total number of 271 executor(s). 17/04/20 19:52:34 INFO dispatcher-event-loop-44 YarnAllocator: Driver requested a total number of 270 executor(s).