Jitendra, thank you for taking a look at this patch.
...it seems the encrypted key is obtained from namenode via rpc for every block...
Actually, we cache the encryption key so that we don't need to keep repeating that RPC. (This is true on current trunk and with my patch too.) The key retrieval is now wrapped behind the DataEncryptionKeyFactory interface. There are 2 implementors of this: the DFSClient itself and the NameNodeConnector used by the balancer. In both of those classes, if you look at the newDataEncryptionKey method, you'll see that they lazily fetch a key and cache it for as long as the key expiry.
getEncryptedStreams doesn't use access token. IMO the user and the password should be derived from the accesstoken rather than the key.
Thanks for catching that. This is a private method, so I can easily remove access token from the signature. We can't change the user/password calculation for the encrypted case now without breaking compatibility.
It might make sense to define the defaults for the new configuration variables in hdfs-default and/or as constants. It helps in code reading at times.
The patch documents the new properties dfs.data.transfer.protection and dfs.data.transfer.saslproperties.resolver.class in hdfs-default.xml. The default values are set to empty/undefined. I think this is what we want, because it's an opt-in feature. Let me know if you had any other configuration properties in mind.
Log.debug should be wrapped inside if (Log.isDebugEnabled()) condition.
The new classes use slf4j. (There was some discussion on mailing lists a few months ago about starting to use this library in new classes.) With slf4j, it's no longer necessary to check isDebugEnabled. slf4j accepts string substitution variables using varargs, and it checks the log level internally first before doing any string concatenation. Explicitly checking isDebugEnabled wouldn't provide any performance benefit.
checkTrustAndSend obtains new encryption key, irrespective of the qop needed. I believe the encryption key is needed only for specialized encryption case.
The 2 implementations of DataEncryptionKeyFactory mentioned above only retrieve an encryption key if encryption is enabled (NameNode is configured with dfs.encrypt.data.transfer=true). For a deployment configured with SASL on DataTransferProtocol, this will be false, so it won't actually get a key. I'll put a comment in SaslDataTransferClient to clarify this.
SaslDataTransferClient object in NameNodeConnector.java seems out of place, the NameNodeConnector is supposed to encapsulate only namenode connections. Can we avoid the saslClient in this class?
Yeah, what was I thinking there? This is needed by the balancer for its DataNode communication when it needs to move blocks. Let me see if I can move it right into the Balancer class.
RemotePeerFactory.java: Javadoc needs update.
Will do. Thanks for the catch.
Minor nit: checkTrustAndSend returns null for skipping handshake which has to be checked in the caller. It could just return the same stream pair, which otherwise every caller has to do.
I actually need to use null as a sentinel value. In peerSend, I need to know whether or not a SASL handshake was performed, and if so, wrap the peer in an instance of EncryptedPeer (which would be better named SaslPeer at this point, but we can refactor that later). If I returned a non-null IOStreamPair always, then I wouldn't be able to do this check.
I'll get to work on a new revision that incorporates your feedback. Thanks again!