Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5131

Distributed shell AM fails when extra container arrives during finishing

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Because of YARN-1902, extra container could be allocated to AM which causes AM failure.

      Logs look like:

      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_e44_1463470957478_0018_01_000007, containerNode=host1:25454, containerNodeURI=host1:8042, containerResourceMemory3072, containerResourceVirtualCores1
      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_e44_1463470957478_0018_01_000007
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e44_1463470957478_0018_01_000007
      16/05/17 07:58:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : host1:25454
      .......
      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: NM Client is being stopped.
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: Waiting for eventDispatcherThread to be interrupted.
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: eventDispatcherThread exited.
      16/05/17 07:58:39 ERROR distributedshell.ApplicationMaster: Failed to start Container container_e44_1463470957478_0018_01_000007
      16/05/17 07:58:39 INFO impl.NMClientAsyncImpl: Stopping NM client.
      ........
      16/05/17 07:58:39 INFO distributedshell.ApplicationMaster: Diagnostics., total=5, completed=6, allocated=6, failed=1
      16/05/17 07:58:39 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
      16/05/17 07:58:40 INFO distributedshell.ApplicationMaster: Application Master failed. exiting
      16/05/17 07:58:40 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
      java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
              at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
              at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
      End of LogType:AppMaster.stde
      

        Attachments

          Activity

            People

            • Assignee:
              leftnoteasy Wangda Tan
              Reporter:
              ssathish@hortonworks.com Sumana Sathish
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: