Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.1
-
None
Description
Currently if a NodeManager is enabled to allocate Docker containers, but the specified binary (docker.binary in the container-executor.cfg) is missing the container allocation fails with the following error message:
Container launch fails Exit code: 29 Exception message: Launch container failed Shell error output: sh: <docker binary path, /usr/bin/docker by default>: No such file or directory Could not inspect docker network to get type /usr/bin/docker network inspect host --format='{{.Driver}}'. Error constructing docker command, docker error code=-1, error message='Unknown error'
I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" to have the following options:
- STARTUP: setting this option the NodeManager would not start if Docker binaries are missing or the Docker daemon is not running (the exception is considered FATAL during startup)
- RUNTIME: would give a more detailed/user-friendly exception in NodeManager's side (NM logs) if Docker binaries are missing or the daemon is not working. This would also prevent further Docker container allocation as long as the binaries do not exist and the docker daemon is not running.
- NONE (default): preserving the current behaviour, throwing exception during container allocation, carrying on using the default retry procedure.
------------------------------------------------------------------------------------------------
A new interface called HealthChecker is introduced which is used in the NodeHealthCheckerService. Currently existing implementations like LocalDirsHandlerService are modified to implement this giving a clear abstraction to the node's health. The DockerHealthChecker implements this new interface.
Attachments
Attachments
Issue Links
- supercedes
-
YARN-2980 Move health check script related functionality to hadoop-common
- Resolved