Thanks Subru Krishnan for the review! I've addressed most comments in v3 patch (as well as ones from Jian He). For the rest, please see below:
Do we need a UnmanagedAMPoolManager per interceptor instance or can we use one at AMRMProxyService level?
It is easier the current way because we need to constantly get all UAM associated with one application (keyed by subClusterId).
If we do one pool per AMRMProxy, then we probably need to key UAM with appId+subclusterId. The search for UAMs associated with one application will not be straight forward.
Is updating the queue below safe in loadAMRMPolicy
Yes, the variable queue is a local string, used by only the policy manager.
I feel the finishApplicationMaster of the pool should be moved to UnmanagedAMPoolManager.
Yes we can choose to. However it will likely be a blocking call then, where we loose the freedom to schedule the tasks, synchronously call finish in home, and then wait for the secondaries to come back. Or, we need addition interface in UAMPoolManager, one for schedule and one for fetch result. I've added a TODO for this.
I see dynamic instantiations of ExecutorCompletionService in finish, register, etc invocations. Wouldn't we be better served by pre-initializing it?
We need to create them locally because of concurrency. The allocate and finish calls can be invoked concurrently. Sharing the same completion service object will confuse the tasks submitted from both sides.
Is getSubClusterForNode required as the resolver should be doing this instead of every client?
AbstractSubClusterResolver.getSubClusterForNode throws when resolving an unknown node, we don't want to throw in this case, and thus need to catch and log the warning.
Move YarnConfiguration outside the for loop in registerWithNewSubClusters
We cannot because we need a different config per UAM, loaded with the sub-cluster id
Consider looping on registrations in lieu of requests in sendRequestsToSecondaryResourceManagers
Registration only contains the newly added secondary sub-clusters, while we need to loop over (send heartbeat to) all known secondaries here.