Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5937

stop-yarn.sh is not able to gracefully stop node managers

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha2
    • None

    Description

      stop-yarn.sh always gives following output

      ./sbin/stop-yarn.sh
      Stopping resourcemanager
      Stopping nodemanagers
      <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
      <NM_HOST>: ERROR: Unable to kill 18097
      

      this was because resource manager is stopped before node managers, when the shutdown hook manager tries to gracefully stop NM services, NM needs to unregister with RM, and it gets timeout as NM could not connect to RM (already stopped). See log (stop RM then run kill <nm_pid>)

      16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
      ...
      16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
      java.util.concurrent.TimeoutException
      	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
      	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
      ...
      	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
      ...
      16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown forcefully.
      

      the shutdown hooker has a default of 10s timeout, so if RM is stopped before NMs, they always took more than 10s to stop (in java code). However stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.

      It would make sense to stop NMs before RMs in this script, in a graceful way.

      Attachments

        1. YARN-5937.01.patch
          1.0 kB
          Weiwei Yang
        2. nm_shutdown.log
          21 kB
          Weiwei Yang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cheersyang Weiwei Yang
            cheersyang Weiwei Yang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment