Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5937

stop-yarn.sh is not able to gracefully stop node managers

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-alpha2
    • Component/s: None
    • Labels:

      Description

      stop-yarn.sh always gives following output

      ./sbin/stop-yarn.sh
      Stopping resourcemanager
      Stopping nodemanagers
      <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
      <NM_HOST>: ERROR: Unable to kill 18097
      

      this was because resource manager is stopped before node managers, when the shutdown hook manager tries to gracefully stop NM services, NM needs to unregister with RM, and it gets timeout as NM could not connect to RM (already stopped). See log (stop RM then run kill <nm_pid>)

      16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
      ...
      16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
      java.util.concurrent.TimeoutException
      	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
      	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
      ...
      	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
      ...
      16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown forcefully.
      

      the shutdown hooker has a default of 10s timeout, so if RM is stopped before NMs, they always took more than 10s to stop (in java code). However stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.

      It would make sense to stop NMs before RMs in this script, in a graceful way.

      1. nm_shutdown.log
        21 kB
        Weiwei Yang
      2. YARN-5937.01.patch
        1.0 kB
        Weiwei Yang

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 8m 22s trunk passed
        +1 mvnsite 3m 31s trunk passed
        +1 mvnsite 3m 33s the patch passed
        +1 shellcheck 0m 13s The patch generated 0 new + 116 unchanged - 1 fixed = 116 total (was 117)
        +1 shelldocs 0m 11s There were no new shelldocs issues.
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 unit 5m 41s hadoop-yarn in the patch passed.
        +1 asflicense 0m 33s The patch does not generate ASF License warnings.
        22m 46s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-5937
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12840675/YARN-5937.01.patch
        Optional Tests asflicense mvnsite unit shellcheck shelldocs
        uname Linux 4d1ba7995b5b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 1f7613b
        shellcheck v0.4.5
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14142/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn U: hadoop-yarn-project/hadoop-yarn
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14142/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 22s trunk passed +1 mvnsite 3m 31s trunk passed +1 mvnsite 3m 33s the patch passed +1 shellcheck 0m 13s The patch generated 0 new + 116 unchanged - 1 fixed = 116 total (was 117) +1 shelldocs 0m 11s There were no new shelldocs issues. +1 whitespace 0m 0s The patch has no whitespace issues. +1 unit 5m 41s hadoop-yarn in the patch passed. +1 asflicense 0m 33s The patch does not generate ASF License warnings. 22m 46s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-5937 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12840675/YARN-5937.01.patch Optional Tests asflicense mvnsite unit shellcheck shelldocs uname Linux 4d1ba7995b5b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1f7613b shellcheck v0.4.5 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14142/testReport/ modules C: hadoop-yarn-project/hadoop-yarn U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/14142/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        cheersyang Weiwei Yang added a comment -

        Hello Naganarasimha G R

        Do you have any suggestions to the fix I proposed? Would you like to review this patch? Thanks a lot.

        Show
        cheersyang Weiwei Yang added a comment - Hello Naganarasimha G R Do you have any suggestions to the fix I proposed? Would you like to review this patch? Thanks a lot.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Sorry for the delayed reply, Actually i was looking out for normal case also NM was not shutting down gracefully. Offlate i have not tested trunk code. Let me test if its there then we can fix both issues together. Existing solution seems fine to me !

        Show
        Naganarasimha Naganarasimha G R added a comment - Sorry for the delayed reply, Actually i was looking out for normal case also NM was not shutting down gracefully. Offlate i have not tested trunk code. Let me test if its there then we can fix both issues together. Existing solution seems fine to me !
        Hide
        cheersyang Weiwei Yang added a comment -

        Hello Naganarasimha G R

        Thanks a lot for looking into this one, any updates?

        Show
        cheersyang Weiwei Yang added a comment - Hello Naganarasimha G R Thanks a lot for looking into this one, any updates?
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Weiwei Yang,
        Sorry for the delay just verified the trunk code, it was happening due to my trunk code, your approach is fine will commit it shortly.

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Weiwei Yang , Sorry for the delay just verified the trunk code, it was happening due to my trunk code, your approach is fine will commit it shortly.
        Hide
        cheersyang Weiwei Yang added a comment -

        Perfect, thank you Naganarasimha G R

        Show
        cheersyang Weiwei Yang added a comment - Perfect, thank you Naganarasimha G R
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        thanks for the contribution Weiwei Yang, have committed it to trunk.

        Show
        Naganarasimha Naganarasimha G R added a comment - thanks for the contribution Weiwei Yang , have committed it to trunk.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11098 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11098/)
        YARN-5937. stop-yarn.sh is not able to gracefully stop node managers. (naganarasimha_gr: rev 41db07d532f41fd35b11935b2bb042973831951b)

        • (edit) hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11098 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11098/ ) YARN-5937 . stop-yarn.sh is not able to gracefully stop node managers. (naganarasimha_gr: rev 41db07d532f41fd35b11935b2bb042973831951b) (edit) hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh

          People

          • Assignee:
            cheersyang Weiwei Yang
            Reporter:
            cheersyang Weiwei Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development