Details
-
Improvement
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
When stopping NM or decommission NM, stopping all containers, the waiting time is composed of three values sleep-delay-before-sigkill+process-kill-wait+SHUTDOWN_CLEANUP_SLOP_MS (constant 1000)
yarn.nodemanager.sleep-delay-before-sigkill.ms=250
yarn.nodemanager.process-kill-wait.ms=5000
SHUTDOWN_CLEANUP_SLOP_MS=1000
The parameters of sleep-delay-before-sigkill and process-kill-wait are the time to kill a container/process. When there are too many container lists to be killed, it is usually not completely killed.
We can make SHUTDOWN_CLEANUP_SLOP_MS a configurable parameter, so that in some scenarios, we can wait as long as possible to kill all containers to complete.