Maintaining both an ipAddr/hostName plus nodeAddr with the same information, which can become inconsistent is error prone. For example what do you do when the ipAddr and the nodeAddr disagree?
They should never disagree because the nodeAddr is based on the ipAddr, and when the nodeAddr is changed, so is the ipAddr.
The ipAddr field for a DataNode ID should never change because it (and the xferPort) are the unique key for a DataNode.
They will change when a pre-existing node, say one with the same storage id, is updated with the new info.
We also now have to worry about the state where we're both resolved and unresolved.
We need to worry about that case just like the code did before. Let's say the exclude list has hostnames. A node registration occurs but there's a dns hiccup so all we have is its ip. Your proposed patch may let the node in whereas the existing code (and my patch) will block the node.
What do you think of the attached patch? It sets the DatanodeID hostname field at registration time (like the IP addr) ...
The patch appears to change the way the include and exclude work by trusting who the datanode claims to be. What if a datanode "lies" about who it is? Or if a dns hiccup occurs when the datanode is going to register? It sends its name as an ip, but the exclude list only has hosts. There are a number of scenarios where a datanode could bypass the include/exclude list, which is why we should never trust the client.
... using the same lookup we do today and replaces the two problematic lookups with uses of this field.
Unless I've overlooked something, there's only one lookup that occurs?
I'll post a minor rev for consideration that should further ensure the fields never go out of sync.