Even if NM succeeded to send the increasedContainers to RM, if NM re-registers back with RM before the container size is updated, RM will also recover the container with old resources.
The increasedContainers contains the target resource size for each container being increased, so if NM-registers back with RM before the container size is updated, RM will check both containerStatus (which contains the old resource for the container) and increasedContainers (which contains the target resource for the container) in the RegisterNodeManager request, and will be able to recover the correct container size.
The solution I have in mind is that, we do not keep track of extra increasedContainers in NMContext. We always rely on NMContext#containers to send the container status. RM will check container size based on the containerStatus in node heartbeat.
The question I have with this solution is: how does RM know that an increase has been successfully completed in NM without an explicit protocol? Does RM keep checking the size of each container reported by NM from heartbeat to heartbeat, and decide that an increase has been completed if the container size from the previous heartbeat is smaller than the container size from the current heartbeat? I think this won't work in the RM restart scenario you mentioned. Consider the following sequence of events:
- RM restarts while there is an increase going on in NM
- NM re-registers with RM before the container size is updated in NM, and RM recovers all containers with old resources, and builds up its internal resource bookkeeping for scheduler
- Later on container size is updated in NM, and RM gets the increased container size in the next heartbeat request. What should RM do now? It cannot simply go ahead to increase the resource bookkeeping in its scheduler, because the scheduler did not allocate the extra resource after restart.
IMHO, it is crucial for RM to recover the correct container size during the NM registration if there is a pending container resource increase action going on in NM, that is the reason I propose to add the increasedContainers to the RegisterNodeManagerRequestProto, and also make sure that a container is only removed from increasedContainers when its resize is completed in NM.