Hadoop Common
  1. Hadoop Common
  2. HADOOP-6106

Provide an option in ShellCommandExecutor to timeout commands that do not complete within a certain amount of time.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In MAPREDUCE-211 we came across a need to provide an option to timeout commands launched via the ShellCommandExecutor. The use case is for the health check script being developed in MAPREDUCE-211. We would like the TaskTracker thread to not be blocked by a problematic script or in instances where fork()+exec() has hung (which apparently has been observed in large clusters).

      1. mapred-211-common-3.patch
        9 kB
        Sreekanth Ramakrishnan
      2. HADOOP-6106-2.patch
        9 kB
        Sreekanth Ramakrishnan
      3. HADOOP-6106-1.patch
        9 kB
        Sreekanth Ramakrishnan
      4. HADOOP-6106.patch
        10 kB
        Hemanth Yamijala

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/9/)
          . Updated hadoop-core and test jars from hudson trunk #12

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #9 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/9/ ) . Updated hadoop-core and test jars from hudson trunk #12
          Hide
          Hemanth Yamijala added a comment -

          Mapreduce tests also ran, except for some test case failures that are already logged. The jars can be committed to HDFS and Map/Reduce subprojects now.

          Show
          Hemanth Yamijala added a comment - Mapreduce tests also ran, except for some test case failures that are already logged. The jars can be committed to HDFS and Map/Reduce subprojects now.
          Hide
          Hemanth Yamijala added a comment -

          HDFS tests passed with the new jars.

          Show
          Hemanth Yamijala added a comment - HDFS tests passed with the new jars.
          Hide
          Hemanth Yamijala added a comment -

          I had a chat with Owen and Giri about how to get this dependency jar into the HDFS and MapReduce sub projects. Basically the current school of thought (until IVY is fixed to automate this) is to take the latest built binary from Hudson (http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Common-trunk/) and commit it to the HDFS and MapReduce sub projects - making an entry in changes.txt referencing this JIRA.

          We are running HDFS and MapReduce unit tests with the latest jar to make sure tests work fine. Once that's done, we'll commit it.

          Show
          Hemanth Yamijala added a comment - I had a chat with Owen and Giri about how to get this dependency jar into the HDFS and MapReduce sub projects. Basically the current school of thought (until IVY is fixed to automate this) is to take the latest built binary from Hudson ( http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Common-trunk/ ) and commit it to the HDFS and MapReduce sub projects - making an entry in changes.txt referencing this JIRA. We are running HDFS and MapReduce unit tests with the latest jar to make sure tests work fine. Once that's done, we'll commit it.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #8 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/8/)
          . Provides an option in ShellCommandExecutor to timeout commands that do not complete within a certain amount of time. Contributed by Sreekanth Ramakrishnan.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #8 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/8/ ) . Provides an option in ShellCommandExecutor to timeout commands that do not complete within a certain amount of time. Contributed by Sreekanth Ramakrishnan.
          Hide
          Hemanth Yamijala added a comment -

          I just committed this. Thanks, Sreekanth !

          Show
          Hemanth Yamijala added a comment - I just committed this. Thanks, Sreekanth !
          Hide
          Hemanth Yamijala added a comment -

          +1 for the changes.

          Show
          Hemanth Yamijala added a comment - +1 for the changes.
          Hide
          Sreekanth Ramakrishnan added a comment -

          output from ant test-patch

               [exec]
               [exec]
               [exec] -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     -1 javac.  The applied patch generated 64 javac compiler warnings (more than the trunk's current 124 warnings).
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     -1 release audit.  The applied patch generated 271 release audit warnings (more than the trunk's current 269 warnings).
               [exec]
          

          All test cases passed locally.

          Show
          Sreekanth Ramakrishnan added a comment - output from ant test-patch [exec] [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 64 javac compiler warnings (more than the trunk's current 124 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 271 release audit warnings (more than the trunk's current 269 warnings). [exec] All test cases passed locally.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch as per Hemanth's comment.

          Running ant test and test-patch again.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch as per Hemanth's comment. Running ant test and test-patch again.
          Hide
          Hemanth Yamijala added a comment -

          Sigh. Found one more problem. In the timer task timeout, the variable timedout must be set up before the process.destroy, because the exception would be thrown asynchronously when the process is destroyed.

          Show
          Hemanth Yamijala added a comment - Sigh. Found one more problem. In the timer task timeout, the variable timedout must be set up before the process.destroy, because the exception would be thrown asynchronously when the process is destroyed.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Output from ant test-patch

               [exec]
               [exec] -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     -1 javac.  The applied patch generated 64 javac compiler warnings (more than the trunk's current 124 warnings).
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     -1 release audit.  The applied patch generated 271 release audit warnings (more than the trunk's current 269 warnings).
          

          Release audit is flagged because of changes to Shell and ShellCommandExecutor checking javac warnings does not point to any of the changes which were made in this patch.

          All tests passes successfully on local box.

          Show
          Sreekanth Ramakrishnan added a comment - Output from ant test-patch [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 64 javac compiler warnings (more than the trunk's current 124 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 271 release audit warnings (more than the trunk's current 269 warnings). Release audit is flagged because of changes to Shell and ShellCommandExecutor checking javac warnings does not point to any of the changes which were made in this patch. All tests passes successfully on local box.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching latest patch fixing findbugs warning.

          • Changing ShellTimeoutTimerTask to private static class.
          Show
          Sreekanth Ramakrishnan added a comment - Attaching latest patch fixing findbugs warning. Changing ShellTimeoutTimerTask to private static class.
          Hide
          Hemanth Yamijala added a comment -

          The patch contains the following changes:

          • Converted the timedOut variable to an atomic boolean, as it was being accessed from the timer task as well as from the ShellCommandExecutor.
          • Creating the Timer only if the timeout interval is > 0.
          • Setting completed variable at exactly the same places as the previous code in order not to change contract.
          • Cancelling timer in the finally block of the code.
          • Refactored the constructors of ShellCommandExecutor to all reach one constructor.

          Sreekanth, can you please run ant test and test-patch so I can commit this ?

          Show
          Hemanth Yamijala added a comment - The patch contains the following changes: Converted the timedOut variable to an atomic boolean, as it was being accessed from the timer task as well as from the ShellCommandExecutor. Creating the Timer only if the timeout interval is > 0. Setting completed variable at exactly the same places as the previous code in order not to change contract. Cancelling timer in the finally block of the code. Refactored the constructors of ShellCommandExecutor to all reach one constructor. Sreekanth, can you please run ant test and test-patch so I can commit this ?
          Hide
          Hemanth Yamijala added a comment -

          New patch which Sreekanth and I worked on together.

          Show
          Hemanth Yamijala added a comment - New patch which Sreekanth and I worked on together.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching Shell timeout feature patch.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching Shell timeout feature patch.
          Hide
          Hemanth Yamijala added a comment -

          Code was being reviewed in MAPREDUCE-211. Sreekanth, can you please put up the latest patch here ?

          Show
          Hemanth Yamijala added a comment - Code was being reviewed in MAPREDUCE-211 . Sreekanth, can you please put up the latest patch here ?

            People

            • Assignee:
              Sreekanth Ramakrishnan
              Reporter:
              Hemanth Yamijala
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development