Hadoop components are increasingly being deployed on VMs and containers. One aspect of this environment is that DNS is dynamic. Hostname records get modified (or deleted/recreated) as a container in Kubernetes (or even VM) is being created/recreated. In such dynamic environments, the initial DNS resolution request might return resolution failure briefly as DNS client doesn't always get the latest records. This has been observed in Kubernetes in particular. In such cases NetUtils.connect() appears to throw java.nio.channels.UnresolvedAddressException. In much of Hadoop code (like DFSInputStream and DFSOutputStream), the code is designed to retry IOException. However, since UnresolvedAddressException is not child of IOException, no retry happens and the code aborts immediately. It is much better if NetUtils.connect() throws java.net.UnknownHostException as that is derived from IOException and the code will treat this as a retry-able error.
- links to