Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.5.0
-
Mesosphere Sprint 75, Mesosphere Sprint 76
-
5
Description
In the Docker executor, many calls later in the executor's lifecycle are gated on an initial docker inspect call returning: https://github.com/apache/mesos/blob/bc6b61bca37752689cffa40a14c53ad89f24e8fc/src/docker/executor.cpp#L223
If that first call to docker inspect never returns, the executor becomes stuck in a state where it makes no progress and cannot be killed.
It's tempting for the executor to simply commit suicide after a timeout, but we must be careful of the case in which the executor's Docker container is actually running successfully, but the Docker daemon is unresponsive. In such a case, we do not want to send TASK_FAILED or TASK_KILLED if the task's container is running successfully.
Attachments
Issue Links
- depends upon
-
MESOS-8575 Improve discard handling for 'Docker::stop' and 'Docker::pull'.
- Resolved
- is related to
-
MESOS-9191 Docker command executor may stuck at infinite unkillable loop.
- Accepted