Thanks Andrew Wang for the review! Appreciate your comments.
Patch 5 addresses some of them, with the exceptions open for discuss:
Regarding moving the config key, I think the other Renewers get around this by embedding the static class within the parent class and accessing required state statically. I think the parent here would be KMSClientProvider. It wouldn't be good to tie renewal to this HDFS key anyway, since the KMS is used for more than just HDFS encryption.
Let's talk about this in more details.
I think there're 2 things mixed here:
- The renewer could be a static inner class of KMSClientProvider.
Sure, I moved the renewer into KMSCP.
- ...accessing required state statically...
I don't think this is true. Since it's static it wouldn't read any states, just configs, right?
Among those renewers, TokenAspect and MRDelegationTokenRenewer are not embedded. Both of them plus the renewers embedded in TimelineDelegationTokenIdentifier and RMDelegationTokenIdentifier all create the instance to renewDelegationToken using conf and token. For KMS, we can't get the url from the token because 1) it doesn't have proto (i.e. http v.s. https) 2) it's not HA-aware.
So I think we need to read from config, which is what KMSUtil#createKeyProvider does when it calls into KMSCP#createProvider.
I think Wei-Chiu Chuang's proposal may work (thanks Wei-Chiu!), but I also found
HDFS-7004 when the dfs config was added, and the linked HADOOP-11054 saying 'In the case of HDFS encryption, it would be desirable to be explicitly specify a KeyProvider URI to avoid obscure misconfigurations.'. IMO, KMS being a general component should just use the common hadoop.security.key.provider.path. Reused the logic of DFSUtilClient in KMSUtil, and updated callers with proper keys.
Why does renewal in particularly need a URL with USER_NAME set? IIUC this is needed for PseudoAuthentication, but here we're doing DT authentication?
Good catch!! I guess I was not super clear on the concept so I missed it: Initially PseudoAuth was failing and I debugged and added user name to let it pass. However, since we're using DT here we definitely should just use DTAuth. After more background work, I found out theres a bug in DelegationTokenAuthenticatedURL when handling delegation tokens for renew/cancel/add. (openConnection was done right, in
HADOOP-10880, but seems missed renew/cancel/add.). I fixed it in DTAuthenticatedURL and DTAuthenticator, by using a similar fall back logic. (Can't do it in DTAuthenticatedURL along because the connection object is created in DTAuthenticator for those methods.)
Unit test on this when DelegationTokenAuthenticatedURL#useQueryStringforDelegationToken == false seems to be non-trivial and not easy to mock/spy, since it's all created on the fly. Considering this is supposed to be deprecated, I manually tested it by setting the default to true.