Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3944

Connection refused to nodemanagers are retried at multiple levels

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • 2.6.0
    • None
    • None
    • None

    Description

      This is related to YARN-3238. When NM is down, ipc client will get ConnectException.

      Caused by: java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
      at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
      at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
      at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
      at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
      at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
      at org.apache.hadoop.ipc.Client.call(Client.java:1438)

      However, retry happens at two layers(ipc retry 40 times and serverProxy retrying 91 times), this could end up with ~1 hour retry interval.

      Attachments

        1. YARN-3944.v1.patch
          1 kB
          Siqi Li

        Issue Links

          Activity

            People

              l201514 Siqi Li
              l201514 Siqi Li
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: