[SPARK-29950] Deleted excess executors can connect back to driver in K8S with dyn alloc on - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: Kubernetes, Spark Core
Labels:
None

Description

ExecutorPodsAllocator currently has code to delete excess pods that the K8S server hasn't started yet, and aren't needed anymore due to downscaling.

The problem is that there is a race between K8S starting the pod and the Spark code deleting it. This may cause the pod to connect back to Spark and do a lot of initialization, sometimes even being considered for task allocation, just to be killed almost immediately.

This doesn't cause any problems that I could detect in my tests, but wastes resources, and causes logs to contains misleading messages about the executor being killed. It would be nice to avoid that.

Attachments

Issue Links

links to

GitHub Pull Request #26586

Activity

People

Assignee:: Marcelo Masiero Vanzin

Reporter:: Marcelo Masiero Vanzin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Nov/19 17:59

Updated:: 17/May/20 18:25

Resolved:: 16/Jan/20 21:37