The DatanodeID "name" field is currently overloaded, when the DN creates a DatanodeID to register with the NN it sets "name" to be the datanode hostname, which is the DN's "hostName" member. This isnot necesarily a FQDN, it is either set explicitly or determined by the DNS class, which could return the machine's hostname or the result of a DNS lookup, if configured to do so. The NN then clobbers the "name" field of the DatanodeID with the IP part of the new DatanodeID "name" field it creates (and sets the DatanodeID "hostName" field to the reported "name"). The DN gets the DatanodeID back from the NN and clobbers its "hostName" member with the "name" field of the returned DatanodeID. This makes the code hard to reason about eg DN#getMachine name sometimes returns a hostname and sometimes not, depending on when it's called in sequence with the registration. Ditto for uses of the "name" field. I think these contortions were originally performed because the DatanodeID didn't have a hostName field (it was part of DatanodeInfo) and so there was no way to communicate both at the same time. Now that the hostName field is in DatanodeID (as of
HDFS-3164) we can establish the invariant that the "name" field always and only has an IP address and the "hostName" field always and only has a hostname.
HDFS-3144 I'm going to rename the "name" field so its clear that it contains an IP address. The above is enough scope for one change.