Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5131

Distributed shell AM fails when extra container arrives during finishing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      Because of YARN-1902, extra container could be allocated to AM which causes AM failure.

      Logs look like:

      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_e44_1463470957478_0018_01_000007, containerNode=host1:25454, containerNodeURI=host1:8042, containerResourceMemory3072, containerResourceVirtualCores1
      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_e44_1463470957478_0018_01_000007
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e44_1463470957478_0018_01_000007
      16/05/17 07:58:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : host1:25454
      .......
      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: NM Client is being stopped.
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: Waiting for eventDispatcherThread to be interrupted.
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: eventDispatcherThread exited.
      16/05/17 07:58:39 ERROR distributedshell.ApplicationMaster: Failed to start Container container_e44_1463470957478_0018_01_000007
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: Stopping NM client.
      ........
      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Diagnostics., total=5, completed=6, allocated=6, failed=1
      16/05/17 07:58:39 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
      16/05/17 07:58:40 INFO distributedshell.ApplicationMaster: Application Master failed. exiting
      16/05/17 07:58:40 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
      java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
              at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
              at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
      End of LogType:AppMaster.stde
      

      Attachments

        1. YARN-5131.1.patch
          1 kB
          Wangda Tan

        Activity

          People

            leftnoteasy Wangda Tan
            ssathish@hortonworks.com Sumana Sathish
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: