Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5103

With NM recovery enabled, restarting NM multiple times results in AM restart

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • yarn
    • None
    • Reviewed

    Description

      AM is restarted when NM is restarted multiple times even though NM recovery is enabled.

      NM log on which AM attempt 1 was running
       ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e12_1463043063682_0002_01_000001
      java.io.IOException: java.lang.InterruptedException
      	at org.apache.hadoop.util.Shell.runCommand(Shell.java:579)
      	at org.apache.hadoop.util.Shell.run(Shell.java:487)
      	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
      	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:478)
      	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerProcessAlive(LinuxContainerExecutor.java:542)
      	at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:185)
      	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:445)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. YARN-5103-v2.patch
          2 kB
          Junping Du
        2. YARN-5103-demo.patch
          2 kB
          Junping Du
        3. YARN-5103.patch
          2 kB
          Junping Du

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            junping_du Junping Du
            ssathish@hortonworks.com Sumana Sathish
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment