From a design point of view would it be possible to just add an attribute or flag on hdfs files or directories that specify an block affinity group? This would seem cheaper than an alternative that specifies specific favored dn's lists for block replicas. This would seem more robust than something specified only at file creation time, and more managable in the long term if data nodes membership changes over time.
+1, all of this makes sense to me.
in the use case of HBase, region files are rewritten as part of compaction, and that would again create the blocks in the favored nodes...
True, though I can imagine other uses besides just HBase region files that could benefit from a feature like this, e.g. some file formats which will be used in joins could benefit from HDFS trying to place the replicas of a few separate files on the same set of DNs. In that case we shouldn't assume that the files will be short-lived/rewritten during a compaction process.
Of course persistence of these hints could be done as a separate JIRA, but we might consider as part of this JIRA whether the API could be made appropriate for both use cases - long-lived and short-lived files. For that matter, we might deliberately make these hints non-persistent in the branch-1 implementation so as to avoid having to bump the edit log version number, but persistent in the trunk/branch-2 implementation.
but let me get to the next level of detail on that.
Sorry, I don't understand. What do you mean by this?
Also, I realize that the current patch is intended to be WIP, but it also appears to be targeted at branch-1. Before we can commit this to branch-1, we'll need to have a trunk/branch-2 patch.