HI Rushabh Shah,
Thanks again for your work. Below are my comments in reply to yours. Just saw Andrew Wang's, thanks Andrew!
final FileEncryptionInfo feInfo = dfsos.getFileEncryptionInfo();
final CryptoCodec codec = getCryptoCodec(conf, feInfo);
in createWrappedOutputStream, where conf is the configuration of local cluster. There is a possibility that the local configuration is different than remote cluster's. So it's possible to fail here.
Sorry I was a bit unclear earlier, this comment is not about the change of this jira, but about the existing implementation. My concern is about Codec rather than keyProvider here. We get feInfo from the target file, getting Codec based on conf and feInfo. The conf here is the configuration of the local cluster. When the target is the remote cluster, could we fail to find Codec info or wrong codec info because we don't have remote cluster's configuration? That's what wanted to say. So I hope Andrew Wang who was involved in the original development of encryption feature could comment. (Andrew, saw you above comment, but would you please look at my comment here again and see if it makes sense?
2. It looks having a HADOOP_SECURITY_KEY_PROVIDER_PATH_DEFAULT constant would help making the code more consistent, even for the patch here, but having a separate jira works for me too.
3. Keeping the name keyProviderUri instead of keyProviderPath is actually fine. I did some study before creating patch rev3, the format appears to be Uri. I wish
HDFS-10489 made it uri instead of path. So I did not change uri to path in rev3.
4. About adding two methods to add/get from credentials, it's a way of encapsulating how it's handled in one place, and share the way how the key in the map is generated (e.g. uri.getScheme()+"://"+uri.getAuthority()). You can see my example in rev3. These methodd are also called in Test code.
I don't think key provider is used by WebHDFSFileSystem. Maybe I'm missing something.
Can you please elaborate your comment ?
I was just guessing, but I'm not so sure, but hope Daryn Sharp can comment.
7. About your question w.r.t. public boolean isHDFSEncryptionEnabled() throwing StandbyException. There is a solution, that is, we need to incorporate remote's cluster's nameservices configurations in the client (distcp for example) configuration, and let the client handle the NN failover and retry. We need to document this.
For HA cluster, we can access NN via nameservice (for example, hadoop distcp hdfs://nameservice1:/xyz hdfs://nameservice2:/abc) , so the StandbyException can be detected and different NN will be tried automatically. See https://issues.apache.org/jira/browse/HDFS-6376. We actually see a non-intrusive way to do that without using dfs.internal.nameservices", that is, we copy the local cluster's conf to a new dir, and append the nameservice portion of the remote cluster's conf to the hdfs-site.xml in the new dir. Then pass the new dir to distcp.