[MESOS-8574] Docker executor makes no progress when 'docker inspect' hangs - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.0
Fix Version/s: 1.4.2, 1.5.1, 1.6.0
Component/s: docker, executor
Labels:
- mesosphere

Epic Link:
Docker Improvements
Sprint:
Mesosphere Sprint 75, Mesosphere Sprint 76
Story Points:
5

Description

In the Docker executor, many calls later in the executor's lifecycle are gated on an initial docker inspect call returning: https://github.com/apache/mesos/blob/bc6b61bca37752689cffa40a14c53ad89f24e8fc/src/docker/executor.cpp#L223

If that first call to docker inspect never returns, the executor becomes stuck in a state where it makes no progress and cannot be killed.

It's tempting for the executor to simply commit suicide after a timeout, but we must be careful of the case in which the executor's Docker container is actually running successfully, but the Docker daemon is unresponsive. In such a case, we do not want to send TASK_FAILED or TASK_KILLED if the task's container is running successfully.