[MESOS-8488] Docker bug can cause unkillable tasks. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.0
Fix Version/s: 1.4.2, 1.5.1, 1.6.0
Component/s: containerization
Labels:
- mesosphere

Target Version/s:

1.6.0
Epic Link:
Docker Improvements
Sprint:
Mesosphere Sprint 74
Story Points:
2

Description

Due to an issue on the Moby project, it's possible for Docker versions 1.13 and later to fail to catch a container exit, so that the docker run command which was used to launch the container will never return. This can lead to the Docker executor becoming stuck in a state where it believes the container is still running and cannot be killed.

We should update the Docker executor to ensure that containers stuck in such a state cannot cause unkillable Docker executors/tasks.

One way to do this would be a timeout, after which the Docker executor will commit suicide if a kill task attempt has not succeeded. However, if we do this we should also ensure that in the case that the container was actually still running, either the Docker daemon or the DockerContainerizer would clean up the container when it does exit.

Another option might be for the Docker executor to directly wait() on the container's Linux PID, in order to notice when the container exits.

Attachments

Issue Links

causes

MESOS-8876 Normal exit of Docker container using rexray volume results in TASK_FAILED.

Resolved

is related to

MESOS-8694 Possibly unkillable task for customer docker executor if the docker daemon fails to capture the container exit code.

Open

Activity

People

Assignee:: Qian Zhang

Reporter:: Greg Mann

Shepherd:: Greg Mann

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 25/Jan/18 01:16

Updated:: 22/Mar/19 16:41

Resolved:: 14/Feb/18 03:17