Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2928 YARN Timeline Service v.2: alpha 1
  3. YARN-3634

TestMRTimelineEventHandling and TestApplication are broken

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • YARN-2928
    • 2.9.0, 3.0.0-alpha1
    • timelineserver
    • None
    • Reviewed

    Description

      TestMRTimelineEventHandling is broken. Relevant error message:

      2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client (Client.java:handleConnectionFailure(882)) - Retrying connect to server: asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] collector.NodeTimelineCollectorManager (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with NM Collector Service for application_1431412130291_0001
      2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] containermanager.AuxServices (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The auxService name is timeline_collector and it got an error at event: CONTAINER_INIT
      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97)
      	at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99)
      	at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:226)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:49)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.postPut(NodeTimelineCollectorManager.java:122)
      	at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:95)
      	... 7 more
      Caused by: java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to asf904.gq1.ygridcore.net:0 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
      	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1496)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1423)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
      	at com.sun.proxy.$Proxy108.getTimelineCollectorContext(Unknown Source)
      	at org.apache.hadoop.yarn.server.api.impl.pb.client.CollectorNodemanagerProtocolPBClientImpl.getTimelineCollectorContext(CollectorNodemanagerProtocolPBClientImpl.java:99)
      	at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.updateTimelineCollectorContext(NodeTimelineCollectorManager.java:188)
      	at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.postPut(NodeTimelineCollectorManager.java:116)
      	... 8 more
      Caused by: java.net.ConnectException: Connection refused
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
      	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
      	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
      	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:625)
      	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:723)
      	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1545)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1462)
      	... 14 more
      

      This surfaced when we switched to use port ":0" for the mini-YARN cluster for the node collector service.

      Also, TestApplication tests are broken because the mocked context does not have the configuration object which ApplicationImpl depends on.

      Attachments

        1. YARN-3634-YARN-2928.001.patch
          5 kB
          Sangjin Lee
        2. YARN-3634-YARN-2928.002.patch
          8 kB
          Sangjin Lee
        3. YARN-3634-YARN-2928.003.patch
          8 kB
          Sangjin Lee
        4. YARN-3634-YARN-2928.004.patch
          8 kB
          Sangjin Lee

        Issue Links

          Activity

            People

              sjlee0 Sangjin Lee
              sjlee0 Sangjin Lee
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: