[SPARK-26423] [K8s] Make sure that disconnected executors eventually get deleted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: Kubernetes, Spark Core
Labels:
- bulk-closed

Description

If an executor disconnects we currently only disable it in the KubernetesClusterSchedulerBackend but don't take any further action - in the expectation all the other necessary actions (deleting it from spark, requesting a new replacement executor, ...) will be driven by k8s lifecycle events.
However, this only works if the reason that the executor disconnected is that the executor pod is dying/shutting down/...
It doesn't work if there is just some network issue between driver and executor (but the executor pod is still running in k8s and keeps running).
Thus (as indicated in the TODO comment in KubernetesClusterSchedulerBackend), we should make sure that a disconnected executor eventually does get killed in k8s.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: David Vogelbacher

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Dec/18 21:32

Updated:: 25/May/21 01:49

Resolved:: 25/May/21 01:45