[FLINK-15817] Kubernetes Resource leak while deployment exception happens - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.10.1, 1.11.0
Component/s: Deployment / Kubernetes
Labels:
- pull-request-available

Description

When we deploy a new session cluster on Kubernetes cluster, usually there are four steps to create the Kubernetes components, and the creation order is as below: internal Service -> rest Service -> ConfigMap -> JobManager Deployment.

After the internal Service is created, any Exceptions that fail the cluster deployment progress would cause Kubernetes Resource leak, for example:

If failed to create rest Service due to service name constraint(FLINK-15816), the internal Service would not be cleaned up when the deploy progress terminates.
If failed to create JobManager Deployment(a case is that jobmanager.heap.size is too small such as 512M, which is less than the default configuration value of 'containerized.heap-cutoff-min'), the internal Service, the rest Service, and the ConfigMap all leaks.

This ticket proposes to do some clean-ups(cleans the residual Services and ConfigMap) if the cluster deployment progress terminates accidentally on the client-side.

Attachments

Issue Links

links to

GitHub Pull Request #11672

Activity

People

Assignee:: Canbin Zheng

Reporter:: Canbin Zheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 31/Jan/20 03:18

Updated:: 09/Apr/20 16:14

Resolved:: 09/Apr/20 16:14

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m