I also do not see this as a performance bottleneck, as we are operating on a small set of running vs finished for a node per heartbeat.
The performance impact is basically zero because we're not doing that small set comparison most of the time. The only thing we do all the time is simply increment of an integer during a scan of the node report that was already being done before, then simply comparing that integer to the size of a hash set which is also super cheap. Only when those numbers are different do we do the diff between the set and the report. For
YARN-5197, those numbers will always be the same unless the NM failed to report a container completion which should be a rare event. The performance hit is going to be very hard to detect in practice because of the cheap conditional check up-front before doing the full diff.
This will slowup the cleanup in case if we preempt AM container, but may be more cleaner.
It won't slow how fast the container will be killed, if that's what you mean by "cleanup case." Only the NM can kill it anyway, and it won't know to do so until it subsequently heartbeats. It will slow down how fast the RM will re-schedule the resource associated with the preempted container, since it will wait until the NM confirms the container completion before releasing the resources within the scheduler bookkeeping and re-allocating them. This means today the RM can, and does, accidentally overcommit nodes because it considers the resources free before they actually are free. Filed
YARN-5290 as we've recently seen this in practice on some of our clusters.