Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.1, 1.1.0, 1.2.1, 1.3.0
-
Mesosphere Sprint 61
-
5
Description
If docker stop finishes with an error status, the executor should catch this and react instead of indefinitely waiting for reaped to return.
An interesting question is how to react. Here are possible solutions.
1. Retry docker stop. In this case it is unclear how many times to retry and what to do if docker stop continues to fail.
2. Unmark task as killed. This will allow frameworks to retry the kill. However, in this case it is unclear what status updates we should send: TASK_KILLING for every kill retry? an extra update when we failed to kill a task? or set a specific reason in TASK_KILLING?
3. Clean up and exit. In this case we should make sure the task container is killed or notify the framework and the operator that the container may still be running.
Attachments
Attachments
Issue Links
- is duplicated by
-
MESOS-5722 Docker executor should have a workaround for unresponsive `docker stop`.
- Resolved
- is related to
-
MESOS-4673 Agent fails to shutdown after re-registering period timed-out.
- Resolved
- relates to
-
MESOS-7627 Mesos slave stucks
- Resolved