Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.0.0
-
None
Description
ExecutorPodsAllocator currently has code to delete excess pods that the K8S server hasn't started yet, and aren't needed anymore due to downscaling.
The problem is that there is a race between K8S starting the pod and the Spark code deleting it. This may cause the pod to connect back to Spark and do a lot of initialization, sometimes even being considered for task allocation, just to be killed almost immediately.
This doesn't cause any problems that I could detect in my tests, but wastes resources, and causes logs to contains misleading messages about the executor being killed. It would be nice to avoid that.
Attachments
Issue Links
- links to