Details
Description
Problem:
`dfs.client.use.datanode.hostname` by default is set to false, which means the client will use the IP address of the datanode to connect to the datanode, rather than the hostname of the datanode.
In `org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer`:
try { Peer peer = remotePeerFactory.newConnectedPeer(inetSocketAddress, token, datanode); LOG.trace("nextTcpPeer: created newConnectedPeer {}", peer); return new BlockReaderPeer(peer, false); } catch (IOException e) { LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to" + "{}", datanode); throw e; }
If `dfs.client.use.datanode.hostname` is false, then it will try to connect via IP address. If the IP address is illegal and the connection fails, IOException will be thrown from `newConnectedPeer` and be handled.
If `dfs.client.use.datanode.hostname` is true, then it will try to connect via hostname. If the hostname cannot be resolved, UnresolvedAddressException will be thrown from `newConnectedPeer`. However, UnresolvedAddressException is not a subclass of IOException so `nextTcpPeer` doesn’t handle this exception at all. This unhandled exception could crash the system.
Solution:
Since the method is handling the illegal IP address, then the illegal hostname should be also handled as well. One solution is to add the handling logic in `nextTcpPeer`:
} catch (IOException e) { LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to" + "{}", datanode); throw e; } catch (UnresolvedAddressException e) { ... // handling logic }
I am very happy to provide a patch to do this.
Go ahead!!!