Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3524

Mapreduce failed due to AM Container-Launch failure at NM on windows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.5.2
    • None
    • None
    • None
    • Windows server 2012 and Windows-8
      Hadoop-2.5.2
      Java-1.7

    Description

      I tried to run TEZ job on windows machine
      I successfully Build Tez-0.6.0 against Hadoop-2.5.2
      Then I configured Tez-0.6.0 as like in http://tez.apache.org/install.html

      But I face following error while running this command
      Note: I'm using HADOOP High Availability setup.

      Running OrderedWordCount
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/C:/Hadoop/
      share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBind
      er.class]
      SLF4J: Found binding in [jar:file:/C:/Tez/lib
      /slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      15/04/15 10:47:57 INFO client.TezClient: Tez Client Version: [ component=tez-api
      , version=0.6.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apa
      che.org/repos/asf/tez.git, buildTime=2015-04-15T01:13:02Z ]
      15/04/15 10:48:00 INFO client.TezClient: Submitting DAG application with id: app
      lication_1429073725727_0005
      15/04/15 10:48:00 INFO Configuration.deprecation: fs.default.name is deprecated.
       Instead, use fs.defaultFS
      15/04/15 10:48:00 INFO client.TezClientUtils: Using tez.lib.uris value from conf
      iguration: hdfs://HACluster/apps/Tez/,hdfs://HACluster/apps/Tez/lib/
      15/04/15 10:48:01 INFO client.TezClient: Stage directory /tmp/app/tez/sta
      ging doesn't exist and is created
      15/04/15 10:48:01 INFO client.TezClient: Tez system stage directory hdfs://HACluster
      /tmp/app/tez/staging/.tez/application_1429073725727_0005 doesn't ex
      ist and is created
      15/04/15 10:48:02 INFO client.TezClient: Submitting DAG to YARN, applicationId=a
      pplication_1429073725727_0005, dagName=OrderedWordCount
      15/04/15 10:48:03 INFO impl.YarnClientImpl: Submitted application application_14
      29073725727_0005
      15/04/15 10:48:03 INFO client.TezClient: The url to track the Tez AM: http://MASTER_NN1:8088/proxy/application_1429073725727_0005/
      15/04/15 10:48:03 INFO client.DAGClientImpl: Waiting for DAG to start running
      15/04/15 10:48:09 INFO client.DAGClientImpl: DAG completed. FinalState=FAILED
      OrderedWordCount failed with diagnostics: [Application application_1429073725727
      _0005 failed 2 times due to AM Container for appattempt_1429073725727_0005_00000
      2 exited with  exitCode: -1073741515 due to: Exception from container-launch: Ex
      itCodeException exitCode=-1073741515:
      ExitCodeException exitCode=-1073741515:
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
              at org.apache.hadoop.util.Shell.run(Shell.java:455)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
      702)
              at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
      unchContainer(DefaultContainerExecutor.java:195)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
      ontainerLaunch.call(ContainerLaunch.java:300)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
      ontainerLaunch.call(ContainerLaunch.java:81)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
      java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
      .java:615)
              at java.lang.Thread.run(Thread.java:744)
      
              1 file(s) moved.
      
      Container exited with a non-zero exit code -1073741515
      .Failing this attempt.. Failing the application.]
      

      While Seeing at Resourcemanager log:

      2015-04-19 21:49:57,533 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1429505171727_0001_02_000001, NodeId: SLAVE1:57794, NodeHttpAddress: SLAVE1:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.16.100.92:57794 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:8192, vCores:8>
      2015-04-19 21:49:57,533 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:8192, vCores:8>
      2015-04-19 21:49:57,533 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
      2015-04-19 21:49:57,533 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1429505171727_0001_000002 released container container_1429505171727_0001_02_000001 on node: host: SLAVE1:57794 #containers=0 available=8192 used=0 with event: FINISHED
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:UserConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1429505171727_0001/appattempt_1429505171727_0001_000002 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1429505171727_0001_000002
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1429505171727_0001_000002 State change from FINAL_SAVING to FAILED
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1429505171727_0001 with final state: FAILED
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1429505171727_0001 State change from ACCEPTED to FINAL_SAVING
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1429505171727_0001
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1429505171727_0001_000002 is done. finalState=FAILED
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1429505171727_0001 requests cleared
      2015-04-19 21:49:57,580 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1429505171727_0001 user: SYSTEM queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
      2015-04-19 21:49:57,611 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:UserConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1429505171727_0001 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
      2015-04-19 21:49:57,611 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1429505171727_0001 failed 2 times due to AM Container for appattempt_1429505171727_0001_000002 exited with  exitCode: -1073741515 due to: Exception from container-launch: ExitCodeException exitCode=-1073741515: 
      ExitCodeException exitCode=-1073741515: 
      	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
      	at org.apache.hadoop.util.Shell.run(Shell.java:455)
      	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
      	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      
              1 file(s) moved.
      
      Container exited with a non-zero exit code -1073741515
      .Failing this attempt.. Failing the application.
      2015-04-19 21:49:57,627 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1429505171727_0001 State change from FINAL_SAVING to FAILED
      2015-04-19 21:49:57,627 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1429505171727_0001 user: SYSTEM leaf-queue of parent: root #applications: 0
      2015-04-19 21:49:57,627 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=SYSTEM	OPERATION=Application Finished - Failed	TARGET=RMAppManager	RESULT=FAILURE	DESCRIPTION=App failed with state: FAILED	PERMISSIONS=Application application_1429505171727_0001 failed 2 times due to AM Container for appattempt_1429505171727_0001_000002 exited with  exitCode: -1073741515 due to: Exception from container-launch: ExitCodeException exitCode=-1073741515: 
      ExitCodeException exitCode=-1073741515: 
      	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
      	at org.apache.hadoop.util.Shell.run(Shell.java:455)
      	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
      	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      
              1 file(s) moved.
      
      Container exited with a non-zero exit code -1073741515
      .Failing this attempt.. Failing the application.	APPID=application_1429505171727_0001
      2015-04-19 21:49:57,627 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1429505171727_0001,name=OrderedWordCount,user=SYSTEM,queue=default,state=FAILED,trackingUrl=http://MASTER_NN1:8088/cluster/app/application_1429505171727_0001,appMasterHost=N/A,startTime=1429505386589,finishTime=1429505397580,finalStatus=FAILED
      2015-04-19 21:49:58,580 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8032: readAndProcess from client 172.16.100.XX threw exception [java.io.IOException: An existing connection was forcibly closed by the remote host]
      

      At nodemanager logs

      2015-04-20 10:19:59,365 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [C:\Hadoop\bin\winutils.exe, task, create, container_1429505171727_0001_02_000001, cmd /c /tmp/hadoop-SLAVE1$/nm-local-dir/usercache/SYSTEM/appcache/application_1429505171727_0001/container_1429505171727_0001_02_000001/default_container_executor.cmd]
      2015-04-20 10:19:59,436 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1429505171727_0001_02_000001 is : -1073741515
      2015-04-20 10:19:59,437 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1429505171727_0001_02_000001 and exit code: -1073741515
      ExitCodeException exitCode=-1073741515: 
      	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
      	at org.apache.hadoop.util.Shell.run(Shell.java:455)
      	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
      	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      2015-04-20 10:19:59,438 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:         1 file(s) moved.
      
      2015-04-20 10:19:59,439 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code -1073741515
      2015-04-20 10:19:59,439 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1429505171727_0001_02_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
      2015-04-20 10:19:59,440 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1429505171727_0001_02_000001
      2015-04-20 10:19:59,480 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-SLAVE1$/nm-local-dir/usercache/SYSTEM/appcache/application_1429505171727_0001/container_1429505171727_0001_02_000001
      2015-04-20 10:19:59,480 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=SYSTEM	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1429505171727_0001	CONTAINERID=container_1429505171727_0001_02_000001
      2015-04-20 10:19:59,481 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1429505171727_0001_02_000001 transitioned from EXITED_WITH_FAILURE to DONE
      2015-04-20 10:19:59,481 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1429505171727_0001_02_000001 from application application_1429505171727_0001
      2015-04-20 10:19:59,481 INFO org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
      

      Problem might be while connecting to nodemanager it unable to handshake with ResourceManager.

      If I try in single node hadoop cluster mean It working correctly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            KaveenBigdata Kaveen Raajan
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: