On startup, the Datanode creates an InetSocketAddress to register with each namenode. Though there are retries on connection failure throughout the stack, the same InetSocketAddress is reused.
InetSocketAddress is an interesting class, because it resolves DNS names to IP addresses on construction, and it is never refreshed. Hadoop re-creates an InetSocketAddress in some cases just in case the remote IP has changed for a particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
Anyway, on startup, you cna see the Datanode log: "Namenode...remains unresolved" – referring to the fact that DNS lookup failed.
The Datanode then proceeds to use this unresolved address, as it may work if the DN is configured to use a proxy. Since I'm not using a proxy, it forever prints out this message:
Unfortunately, the log doesn't contain the exception that triggered it, but the culprit is actually in IPC Client: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 to give a clear error message when somebody mispells an address.
However, the fix in
HADOOP-7472 doesn't apply here, because that code happens in Client#getConnection after the Connection is constructed.
My proposed fix (will attach a patch) is to move this exception out of the constructor and into a place that will trigger
HADOOP-7472's logic to re-resolve addresses. If the DNS failure was temporary, this will allow the connection to succeed. If not, the connection will fail after ipc client retries (default 10 seconds worth of retries).
I want to fix this in ipc client rather than just in Datanode startup, as this fixes temporary DNS issues for all of Hadoop.