I was testing the client-side NM graceful decommission and noticed that it was always waiting for the timeout, even if all jobs running on that node (or even the cluster) had already finished.
- JobA is running with at least one container on NodeA
- User runs client-side decom on NodeA at 5:00am with a timeout of 3 hours --> NodeA enters DECOMMISSIONING state
- JobA finishes at 6:00am and there are no other jobs running on NodeA
- User's client reaches the timeout at 8:00am, and forcibly decommissions NodeA
NodeA should have decommissioned at 6:00am.