[FLINK-10743] Use 0 processExitCode for ApplicationStatus.CANCELED - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6.3, 1.7.0
Fix Version/s: 1.8.0
Component/s: Deployment / Kubernetes, Deployment / Mesos, Deployment / YARN, Runtime / Coordination
Labels:
- pull-request-available

Description

org.apache.flink.runtime.clusterframework.ApplicationStatus is used to map org.apache.flink.runtime.jobgraph.JobStatus to a process exit code.

We currently map ApplicationStatus.CANCELED to a non-zero exit code (1444). Since cancellation is a user-triggered operation I would consider this to be a successful exit and map it to exit code 0.

Our current behavior results in applications running via the StandaloneJobClusterEntryPoint and Kubernetes pods as documented in flink-container to be immediately restarted when cancelled. This only leaves the option of killing the respective job cluster master container.

The ApplicationStatus is also used in the YARN and Mesos clients, but I'm not familiar with that part of the code base and can't asses how changing the exit code would affect these clients. A quick usage scan for ApplicationStatus.CANCELED did not surface any problematic usages though.

Attachments

Issue Links

relates to

FLINK-18828 Terminate jobmanager process with zero exit code to avoid unexpected restarting by K8s

Open

links to

GitHub Pull Request #7004

Activity

People

Assignee:: Ufuk Celebi

Reporter:: Ufuk Celebi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 31/Oct/18 21:09

Updated:: 05/Aug/20 11:56

Resolved:: 11/Dec/18 09:25