Updated patch addressing the above comments and my own findings off Sid's last patch.
- Fixed some formatting issues.
- Added more comments where necessary.
- Moved nextMasterKey and corresponding method in BaseContainerTokenSecretManager to RMContainerTokenSecretManager as it belongs there.
- Added explanation of RMContainerTokenSecretManager.activationDelay.
- Fixed a bug in MasterKeyPBImpl.java equals method.
- Collapsed appContainers and containerIdToMasterKeyMap into a single data structure and named it oldMasterKeys. Also declared it to be ConcurrentMap to be clear.
- startContainerSuccessful() had unhandled exception on remoteUGI errors. Fixed them.
- We don't need multiple keys to be sent in node-registration also. Also with multiple keys, setMasterKey() and setOldMasterKey() overwrite each other. Fixed all of this by moving to a single key - the current key to be sent during registration.
Should "Attempt to use an old secret key ..." but something more like "Attempt to relaunch a container"?
Done. Changed to "Attempt to relaunch the same container with id".
NMTokenSecretManager - retrievePassword has a fallback to the current master key when it knows the token is invalid. Why not throw InvalidToken exception?
True. Fixed to throw InvalidToken exception.
ContainerManagerImpl - authorizeRequest assuming one and only one token is a landmine. Should either throw if more than one, or use the token selector to get the intended one.
Done. Checked RPC code too, it sets only the required ID, but just to be sure added code to select the identifier that is needed.
ContainerManagerImpl - startContainerSuccessful is assuming one and only one token. Perhaps pass it the token validated in authorizeRequest?
Done. Same as above.
ContainerManagerImpl - isValidStartContainerRequest(tokenId) is invoked before the null check for tokenId
Should refs to BaseContainerTokenSecretManager be RMContainerTokenSecretManager?
ContainerManagerImpl - Container starts verify the token hasn't already been used. Should status & stop maybe verify it has been used, ie. started?
We don't want to do this as in some cases, a getStatus()/stopContainer() may come before a start, we want to handle that gracefully.
Important: appFinished "forgets" containers already launched, thus allowing tokens to relaunch containers for 48 hours. The token essentially needs to be cancelled after use which probably means remembering them until hard expiration.
The relaunch can happen only after app finishes and only for 24hrs (rolling interval) + 10mins (tokens generated off old secret-key during the last container-expiry interval time). We will need to cache the list of containers or just apps. I am going to fix this separately, will reopen
General questions: In general, it seems like so many objects shouldn't need to hold a reference to the secret managers. Isn't there a context that can hold a ref?
Sure, done this on both NM and NM side. Patch blew up because of this.
Using events to sync the secret keys between the RMContainerSecretManager and ResourceTrackerService (which actually has a ref to the secret manager!) is rather complex.
I agree too. Fixed it to directly access from the SecretManager.