Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-196

Nodemanager should be more robust in handling connection failure to ResourceManager when a cluster is started

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha, 3.0.0-alpha1
    • Fix Version/s: 2.1.0-beta
    • Component/s: nodemanager
    • Labels:
      None

      Description

      If NM is started before starting the RM ,NM is shutting down with the following error

      ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
      org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException
      	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
      	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
      Caused by: java.lang.reflect.UndeclaredThrowableException
      	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
      	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
      	... 3 more
      Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
      	at $Proxy23.registerNodeManager(Unknown Source)
      	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
      	... 5 more
      Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1141)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1100)
      	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
      	... 7 more
      Caused by: java.net.ConnectException: Connection refused
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
      	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
      	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
      	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
      	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1117)
      	... 9 more
      2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
      java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
      	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
      	at java.lang.Thread.run(Thread.java:619)
      2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
      2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999
      2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
      2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290
      2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290
      2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
      2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped.
      2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
      java.lang.InterruptedException
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
      	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
      	at java.lang.Thread.run(Thread.java:619)
      

        Attachments

        1. YARN-196.9.patch
          12 kB
          Xuan Gong
        2. YARN-196.8.patch
          11 kB
          Xuan Gong
        3. YARN-196.7.patch
          11 kB
          Xuan Gong
        4. YARN-196.6.patch
          11 kB
          Xuan Gong
        5. YARN-196.5.patch
          11 kB
          Xuan Gong
        6. YARN-196.4.patch
          11 kB
          Xuan Gong
        7. YARN-196.3.patch
          9 kB
          Xuan Gong
        8. YARN-196.2.patch
          3 kB
          Xuan Gong
        9. YARN-196.12.patch
          12 kB
          Xuan Gong
        10. YARN-196.12.1.patch
          12 kB
          Hitesh Shah
        11. YARN-196.11.patch
          12 kB
          Xuan Gong
        12. YARN-196.10.patch
          12 kB
          Xuan Gong
        13. YARN-196.1.patch
          3 kB
          Xuan Gong
        14. MAPREDUCE-3676.patch
          1 kB
          B Anil Kumar

          Issue Links

            Activity

              People

              • Assignee:
                xgong Xuan Gong
                Reporter:
                ramgopalnaali Ramgopal N
              • Votes:
                1 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: