Thanks, Arpit Agarwal.
This patch shares many similarities with
HDFS-4946, in the following ways:
This patch is orthogonal to storage policy, original block placement policy (i.e., rack-aware policy). The storage policy, local rack and etc are all honored. This patch is basically adding one special case for excludeNodes. I'd say that the only thing it changes is for
HDFS-4946, as an opposite case to HDFS-4946, but only for per-client / DFSOutputStream base. For the cases similar in this JIRA, using storage policy alone does not necessarily provide better data availability (i.e., Hbase still writes to local SSD).
I am also curious about the answer to Devaraj's question.
HDFS-2576 was added specifically for HBase. Can it address your use case?
To some extend,
HDFS-2576 needs each DFSClient have the rest of the cluster in the favoriteNodes to achieve the same purpose. It'd also raise question like: would holding a subset of ND in favoriteNodes affect the efficiency of data placement? or should DFSClient constantly refresh this list of nodes? A similar argument can be applied to HDFS-4946 as well.
The NameNode ignores this CreateFlag.
I am not sure that I understand this question. It is still BlockManager in NameNode making the final decision of block placement (please see my first point). CreateFlag is just a user visible flag to provide the hints. These (and future more) hints are sent to NameNode through ClientNamenodeProtocol RPCs and processed by NameNode.
it will only work for DFSClient users e.g. not for WebHDFS.
At this time, I am not certain that it will not work for WebHDFS. If that is the case, can we file a following JIRA to fix it once the basic function is in place?
Is it honored for appends?
No, it only works for new blocks.
I hope that the above explanations can answer your questions, Arpit Agarwal. Looking forward to hear from you.