Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17052

NetUtils.connect() throws unchecked exception (UnresolvedAddressException) causing clients to abort

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.10.0, 2.9.2, 3.2.1, 3.1.3
    • 2.9.3, 3.1.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0
    • net
    • None

    Description

      Hadoop components are increasingly being deployed on VMs and containers. One aspect of this environment is that DNS is dynamic. Hostname records get modified (or deleted/recreated) as a container in Kubernetes (or even VM) is being created/recreated. In such dynamic environments, the initial DNS resolution request might return resolution failure briefly as DNS client doesn't always get the latest records. This has been observed in Kubernetes in particular. In such cases NetUtils.connect() appears to throw java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like DFSInputStream and DFSOutputStream), the code is designed to retry IOException. However, since UnresolvedAddressException is not child of IOException, no retry happens and the code aborts immediately. It is much better if NetUtils.connect() throws java.net.UnknownHostException as that is derived from IOException and the code will treat this as a retry-able error.

      Attachments

        1. read_failure.log
          0.9 kB
          Dhiraj Hegde
        2. write_failure1.log
          11 kB
          Dhiraj Hegde
        3. write_failure2.log
          3 kB
          Dhiraj Hegde

        Issue Links

          Activity

            People

              dhegde Dhiraj Hegde
              dhegde Dhiraj Hegde
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: