There are several paths that need to be improved with regard to the Docker container lifecycle when running Docker containers on YARN.
1) Provide the ability to keep a container on the NodeManager for a set period of time for debugging purposes.
2) Support sending signals to the process in the container to allow for triggering stack traces, heap dumps, etc.
3) Support for Docker's live restore, which means moving away from the use of docker wait. (
4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
5) Improve the resiliency of container removal by adding retries.
6) Only attempt to stop, kill, and remove containers if the current container state allows for it.
7) Better handling of short lived containers when the container is stopped before the PID can be retrieved. (