Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4744

Too many signal to container failure in case of LCE

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0
    • 2.8.0, 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      Install HA cluster in secure mode
      Enable LCE with cgroups
      Start server with dsperf user
      Submit mapreduce application terasort/teragen with user yarn/dsperf
      Too many signal to container failure

      Submit with user the exception is thrown

      2014-03-02 09:20:38,689 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for testing (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
      2014-03-02 09:20:40,158 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_e02_1393731146548_0001_01_000013
      2014-03-02 09:20:43,071 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_e02_1393731146548_0001_01_000009 succeeded
      2014-03-02 09:20:43,072 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_e02_1393731146548_0001_01_000009 transitioned from RUNNING to EXITED_WITH_SUCCESS
      2014-03-02 09:20:43,073 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e02_1393731146548_0001_01_000009
      2014-03-02 09:20:43,075 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime: Using container runtime: DefaultLinuxContainerRuntime
      2014-03-02 09:20:43,081 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 9. Privileged Execution Operation Output:
      main : command provided 2
      main : run as user is yarn
      main : requested yarn user is yarn
      Full command array for failed execution:
      [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, yarn, yarn, 2, 9370, 15]
      2014-03-02 09:20:43,081 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime: Signal container failed. Exception:
      org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=9:
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
              at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: ExitCodeException exitCode=9:
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
              at org.apache.hadoop.util.Shell.run(Shell.java:838)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
              ... 9 more
      2014-03-02 09:20:43,113 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1393731146548_0001    CONTAINERID=container_e02_1393731146548_0001_01_000009
      2014-03-02 09:20:43,115 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_e02_1393731146548_0001_01_000009 transitioned from EXITED_WITH_SUCCESS to DONE
      2014-03-02 09:20:43,115 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_e02_1393731146548_0001_01_000009 from application application_1393731146548_0001
      
      

      Checked the same scenario in 2.7.2 version (not available)

      Attachments

        1. YARN-4744.001.patch
          6 kB
          Sidharta Seethana
        2. YARN-4744.002.patch
          16 kB
          Sidharta Seethana

        Activity

          People

            sidharta-s Sidharta Seethana
            bibinchundatt Bibin Chundatt
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: