I would like to add the ability to completely remove a nodemanager from the resourcemanager's state.
I run a cloud service where I want to dynamically bring up nodes to act as nodemanagers and then bring them down again when not needed. These nodes have dynamically assigned IPs, thus the alternative of decommissioning them via an excludes file leads to a large (unbounded) list of decommissioned nodes that may never be commissioned again. I would like the ability to move a node from a decommissioned state to completely removing it from the resource manager.
I have thought of two ways of implementing this.
1) Add an optional timeout between the decommission state -> being removed from the nodemanager.
2) Add an explicit RPC to remove a node that is decommissioned.
Any additional thoughts/discussion are welcome.