Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.0
Description
There are several paths that need to be improved with regard to the Docker container lifecycle when running Docker containers on YARN.
1) Provide the ability to keep a container on the NodeManager for a set period of time for debugging purposes.
2) Support sending signals to the process in the container to allow for triggering stack traces, heap dumps, etc.
3) Support for Docker's live restore, which means moving away from the use of docker wait. (YARN-5818)
4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
5) Improve the resiliency of container removal by adding retries.
6) Only attempt to stop, kill, and remove containers if the current container state allows for it.
7) Better handling of short lived containers when the container is stopped before the PID can be retrieved. (YARN-6305)
Attachments
Attachments
Issue Links
- causes
-
YARN-8326 Yarn 3.0 seems runs slower than Yarn 2.6
- Resolved
- is duplicated by
-
YARN-5818 Support the Docker Live Restore feature
- Resolved
-
YARN-7278 LinuxContainer in docker mode will be failed when nodemanager restart, because timeout for docker is too slow.
- Resolved
- is related to
-
YARN-9074 Docker container rm command should be executed after stop
- Resolved
- supercedes
-
YARN-6305 Improve signaling of short lived containers
- Resolved