Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.5.0
-
None
-
Mesosphere Sprint 74, Mesosphere Sprint 75
-
3
Description
When the force argument is not set to true, Docker::pull will always perform a docker inspect call before it does a docker pull. If either of these two Docker CLI calls hangs indefinitely, the Docker container will be stuck in the PULLING state. This means that we make no further progress in the launch() call path, so the executor binary is never executed, the Future associated with the launch() call is never failed or satisfied, and wait() is never called on the container. The agent chains the executor cleanup onto that wait() call which is never made. So, when the executor registration timeout elapses, containerizer->destroy() is called on the executor container, but the rest of the executor cleanup is never performed, and no terminal task status update is sent.
This leaves the task destined for that Docker executor stuck in TASK_STAGING from the framework's perspective, and attempts to kill the task will fail.
Attachments
Issue Links
- is a child of
-
MESOS-8575 Improve discard handling for 'Docker::stop' and 'Docker::pull'.
- Resolved