[MESOS-6743] Docker executor hangs forever if `docker stop` fails. - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.0.1, 1.1.0, 1.2.1, 1.3.0
Fix Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0
Component/s: docker
Labels:
- mesosphere
- reliability

Sprint:
Mesosphere Sprint 61
Story Points:
5

Description

If docker stop finishes with an error status, the executor should catch this and react instead of indefinitely waiting for reaped to return.

An interesting question is how to react. Here are possible solutions.

1. Retry docker stop. In this case it is unclear how many times to retry and what to do if docker stop continues to fail.

2. Unmark task as killed. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: TASK_KILLING for every kill retry? an extra update when we failed to kill a task? or set a specific reason in TASK_KILLING?

3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running.