[FLINK-15843] Gracefully shutdown TaskManagers on Kubernetes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: 1.10.0
Fix Version/s: None
Component/s: Deployment / Kubernetes
Labels:
None

Description

The current solution of stopping a TaskManager instance when JobManager sends a deletion request is by directly calling KubernetesClient.pods().withName().delete, thus that instance would be violently killed with a KILL signal and having no chance to clean up, which could cause problems because we expect the process to gracefully terminate when it is no longer needed.

Refer to the guide of Termination of Pods, we know that on Kubernetes a TERM signal would be first sent to the main process in each container, and may be followed up with a force KILL signal if the graceful shut-down period has expired; the Unix signal will be sent to the process which has PID 1 (Docker Kill), however, the TaskManagerRunner process is spawned by /opt/flink/bin/kubernetes-entry.sh and could never have PID 1, so it would never receive the TERM signal.

Attachments

Issue Links

is blocked by

FLINK-17034 Execute the container CMD under TINI for better hygiene

Open

Activity

People

Assignee:: Unassigned

Reporter:: Canbin Zheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 02/Feb/20 11:17

Updated:: 29/Jan/21 11:04

Resolved:: 29/Jan/21 11:04