Hadoop Common
  1. Hadoop Common
  2. HADOOP-8353

hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1
    • Fix Version/s: 2.0.0-alpha
    • Component/s: scripts
    • Labels:
      None

      Description

      The way that stop actions is implemented is a simple SIGTERM sent to the JVM. There's a time delay between when the action is called and when the process actually exists. This can be misleading to the callers of the *-daemon.sh scripts since they expect stop action to return when process is actually stopped.

      I suggest we augment the stop action with a time-delay check for the process status and a SIGKILL once the delay has expired.

      I understand that sending SIGKILL is a measure of last resort and is generally frowned upon among init.d script writers, but the excuse we have for Hadoop is that it is engineered to be a fault tolerant system and thus there's not danger of putting system into an incontinent state by a violent SIGKILL. Of course, the time delay will be long enough to make SIGKILL event a rare condition.

      Finally, there's always an option of an exponential back-off type of solution if we decide that SIGKILL timeout is short.

      1. HADOOP-8353.patch.txt
        3 kB
        Roman Shaposhnik
      2. HADOOP-8353-2.patch.txt
        4 kB
        Roman Shaposhnik

        Issue Links

          Activity

          Steve Loughran made changes -
          Link This issue incorporates HADOOP-8656 [ HADOOP-8656 ]
          Steve Loughran made changes -
          Link This issue is related to HADOOP-8650 [ HADOOP-8650 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1077 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1077/)
          HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1077 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1077/ ) HADOOP-8353 . hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1041 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1041/)
          HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1041 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1041/ ) HADOOP-8353 . hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2248 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2248/)
          HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251)

          Result = ABORTED
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2248 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2248/ ) HADOOP-8353 . hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251) Result = ABORTED atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2231 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2231/)
          HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2231 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2231/ ) HADOOP-8353 . hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2305 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2305/)
          HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2305 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2305/ ) HADOOP-8353 . hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh
          Aaron T. Myers made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 2.0.0 [ 12320352 ]
          Resolution Fixed [ 1 ]
          Hide
          Aaron T. Myers added a comment -

          I've just committed this to trunk, branch-2, and branch-2.0.0-alpha.

          Thanks a lot for the contribution, Roman.

          Show
          Aaron T. Myers added a comment - I've just committed this to trunk, branch-2, and branch-2.0.0-alpha. Thanks a lot for the contribution, Roman.
          Hide
          Roman Shaposhnik added a comment -

          As far as testing goes – I just manually replaced existing scripts in the pseudo distributed Hadoop deployment and ran a couple of start/stop commands.

          Show
          Roman Shaposhnik added a comment - As far as testing goes – I just manually replaced existing scripts in the pseudo distributed Hadoop deployment and ran a couple of start/stop commands.
          Hide
          Aaron T. Myers added a comment -

          I'm confident the test failure is unrelated.

          Show
          Aaron T. Myers added a comment - I'm confident the test failure is unrelated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12526380/HADOOP-8353-2.patch.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.fs.viewfs.TestViewFsTrash

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/978//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/978//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526380/HADOOP-8353-2.patch.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/978//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/978//console This message is automatically generated.
          Aaron T. Myers made changes -
          Fix Version/s 2.0.0 [ 12320352 ]
          Target Version/s 2.0.0 [ 12320352 ]
          Hide
          Aaron T. Myers added a comment -

          Patch looks good to me. Roman, can you comment on what testing you did of this patch?

          +1 pending Jenkins and an answer to the above question.

          Show
          Aaron T. Myers added a comment - Patch looks good to me. Roman, can you comment on what testing you did of this patch? +1 pending Jenkins and an answer to the above question.
          Roman Shaposhnik made changes -
          Attachment HADOOP-8353-2.patch.txt [ 12526380 ]
          Hide
          Roman Shaposhnik added a comment -

          Patch with updated message attached.

          Show
          Roman Shaposhnik added a comment - Patch with updated message attached.
          Hide
          Aaron T. Myers added a comment -

          The unfortunate truth there is that everything else in that script has a YARN_ prefix (I suspect because it was copied from the yarn-daemon.sh). I'd rather keep things consistent, but if you really think this lonely var should be MR_ prefixed – please let me know.

          Got it. Makes sense the way you have it.

          HDFS daemons use hadoop-daemon.sh In fact at this point it can be safely called hdfs-daemon.sh since I don't think anything else is really using it.

          Got it. Thanks for the explanation.

          Show
          Aaron T. Myers added a comment - The unfortunate truth there is that everything else in that script has a YARN_ prefix (I suspect because it was copied from the yarn-daemon.sh). I'd rather keep things consistent, but if you really think this lonely var should be MR_ prefixed – please let me know. Got it. Makes sense the way you have it. HDFS daemons use hadoop-daemon.sh In fact at this point it can be safely called hdfs-daemon.sh since I don't think anything else is really using it. Got it. Thanks for the explanation.
          Hide
          Roman Shaposhnik added a comment -

          @Aaron,

          Perhaps we should make the message more verbose

          Agreed. I'll modify the patch to make it more obvious

          "YARN_STOP_TIMEOUT" in the MR job history serve

          The unfortunate truth there is that everything else in that script has a YARN_ prefix (I suspect because it was copied from the yarn-daemon.sh). I'd rather keep things consistent, but if you really think this lonely var should be MR_ prefixed – please let me know.

          Do similar changes not need to be made for the HDFS daemons?

          HDFS daemons use hadoop-daemon.sh In fact at this point it can be safely called hdfs-daemon.sh since I don't think anything else is really using it.

          Show
          Roman Shaposhnik added a comment - @Aaron, Perhaps we should make the message more verbose Agreed. I'll modify the patch to make it more obvious "YARN_STOP_TIMEOUT" in the MR job history serve The unfortunate truth there is that everything else in that script has a YARN_ prefix (I suspect because it was copied from the yarn-daemon.sh). I'd rather keep things consistent, but if you really think this lonely var should be MR_ prefixed – please let me know. Do similar changes not need to be made for the HDFS daemons? HDFS daemons use hadoop-daemon.sh In fact at this point it can be safely called hdfs-daemon.sh since I don't think anything else is really using it.
          Hide
          Aaron T. Myers added a comment -

          Patch looks pretty good to me. A few questions:

          1. Perhaps we should make the message more verbose in the event we fall back to `kill -9' ? I'm thinking something along the lines of "Daemon did not stop gracefully after signaling it X seconds ago. Trying `kill -9 $TARGET_PID'"
          2. Does it definitely make sense to call the env var "YARN_STOP_TIMEOUT" in the MR job history server? Should it not be "MR_STOP_TIMEOUT" ?
          3. Do similar changes not need to be made for the HDFS daemons?
          Show
          Aaron T. Myers added a comment - Patch looks pretty good to me. A few questions: Perhaps we should make the message more verbose in the event we fall back to `kill -9' ? I'm thinking something along the lines of "Daemon did not stop gracefully after signaling it X seconds ago. Trying `kill -9 $TARGET_PID'" Does it definitely make sense to call the env var "YARN_STOP_TIMEOUT" in the MR job history server? Should it not be "MR_STOP_TIMEOUT" ? Do similar changes not need to be made for the HDFS daemons?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12526026/HADOOP-8353.patch.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/960//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/960//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526026/HADOOP-8353.patch.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/960//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/960//console This message is automatically generated.
          Roman Shaposhnik made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Roman Shaposhnik made changes -
          Field Original Value New Value
          Attachment HADOOP-8353.patch.txt [ 12526026 ]
          Roman Shaposhnik created issue -

            People

            • Assignee:
              Roman Shaposhnik
              Reporter:
              Roman Shaposhnik
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development