Hadoop Common
  1. Hadoop Common
  2. HADOOP-6777

Implement a functionality for suspend and resume a process.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      herriot

      Description

      Adding two methods in DaemonProtocolAspect.aj for suspend and resume the process.

      public int DaemonProtocol.resumeProcess(String pid) throws IOException;
      public int DaemonProtocol.suspendProcess(String pid) throws IOException;

      1. daemonprotocolaspect.patch
        5 kB
        Vinay Kumar Thota
      2. 1753-ydist-security.patch
        4 kB
        Vinay Kumar Thota
      3. 1753-ydist-security.patch
        5 kB
        Vinay Kumar Thota
      4. 1753-ydist-security.patch
        5 kB
        Vinay Kumar Thota
      5. HADOOP-6777.patch
        4 kB
        Vinay Kumar Thota
      6. 6777-ydist-security.patch
        5 kB
        Vinay Kumar Thota

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon HADOOP-6332 [ HADOOP-6332 ]
          Gavin made changes -
          Link This issue depends on HADOOP-6332 [ HADOOP-6332 ]
          Gavin made changes -
          Link This issue is depended upon by MAPREDUCE-1812 [ MAPREDUCE-1812 ]
          Gavin made changes -
          Link This issue blocks MAPREDUCE-1812 [ MAPREDUCE-1812 ]
          Gavin made changes -
          Link This issue is depended upon by HDFS-1174 [ HDFS-1174 ]
          Gavin made changes -
          Link This issue blocks HDFS-1174 [ HDFS-1174 ]
          Konstantin Boudnik made changes -
          Tags herriot
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Vinay Kumar Thota made changes -
          Link This issue blocks MAPREDUCE-1812 [ MAPREDUCE-1812 ]
          Vinay Kumar Thota made changes -
          Link This issue blocks HDFS-1174 [ HDFS-1174 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #349 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/349/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #349 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/349/ )
          Vinay Kumar Thota made changes -
          Attachment 6777-ydist-security.patch [ 12445529 ]
          Hide
          Vinay Kumar Thota added a comment -

          Patch for Yahoo distribution security branch.

          Show
          Vinay Kumar Thota added a comment - Patch for Yahoo distribution security branch.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #267 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/267/)
          HADOOP-6777. Implement a functionality for suspend and resume a process. Contributed by Vinay Thota.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #267 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/267/ ) HADOOP-6777 . Implement a functionality for suspend and resume a process. Contributed by Vinay Thota.
          Konstantin Boudnik made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Boudnik added a comment -

          I have just committed it. Thank you Vinay.

          Show
          Konstantin Boudnik added a comment - I have just committed it. Thank you Vinay.
          Konstantin Boudnik made changes -
          Fix Version/s 0.21.0 [ 12313563 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12445424/HADOOP-6777.patch
          against trunk revision 947882.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445424/HADOOP-6777.patch against trunk revision 947882. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/545/console This message is automatically generated.
          Konstantin Boudnik made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags [Reviewed]
          Hide
          Konstantin Boudnik added a comment -

          +1 patch looks good.

          Show
          Konstantin Boudnik added a comment - +1 patch looks good.
          Konstantin Boudnik made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vinay Kumar Thota made changes -
          Attachment HADOOP-6777.patch [ 12445424 ]
          Hide
          Vinay Kumar Thota added a comment -

          Excluded the configuration file and generated the new patch for common.

          Show
          Vinay Kumar Thota added a comment - Excluded the configuration file and generated the new patch for common.
          Hide
          Konstantin Boudnik added a comment -

          Hmm, clearly isn't good to be verified yet. The reason is similar to another JIRA: Common's part of Herriot doesn't have conf/system-test.xml. The file has been split between HDFS conf/system-test-hdfs.xml and MR
          conf/system-test-mr.xml.Therefore, the portion of this patch related to the config file needs to be moved to the appropriate subprojects.I know this is an annoyance, but it seems to be a price we are going to pay for the Hadoop's separation.

          Show
          Konstantin Boudnik added a comment - Hmm, clearly isn't good to be verified yet. The reason is similar to another JIRA: Common's part of Herriot doesn't have conf/system-test.xml . The file has been split between HDFS conf/system-test-hdfs.xml and MR conf/system-test-mr.xml .Therefore, the portion of this patch related to the config file needs to be moved to the appropriate subprojects.I know this is an annoyance, but it seems to be a price we are going to pay for the Hadoop's separation.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444883/1753-ydist-security.patch
          against trunk revision 947882.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/543/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444883/1753-ydist-security.patch against trunk revision 947882. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/543/console This message is automatically generated.
          Konstantin Boudnik made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Konstantin Boudnik added a comment -

          Seems to be good to be verified.

          Show
          Konstantin Boudnik added a comment - Seems to be good to be verified.
          Konstantin Boudnik made changes -
          Summary Implement a functionality for suspend and resume a task's process. Implement a functionality for suspend and resume a process.
          Affects Version/s 0.21.0 [ 12313563 ]
          Konstantin Boudnik made changes -
          Link This issue depends on MAPREDUCE-1774 [ MAPREDUCE-1774 ]
          Konstantin Boudnik made changes -
          Link This issue depends on HADOOP-6332 [ HADOOP-6332 ]
          Konstantin Boudnik made changes -
          Project Hadoop Map/Reduce [ 12310941 ] Hadoop Common [ 12310240 ]
          Key MAPREDUCE-1753 HADOOP-6777
          Issue Type Task [ 3 ] Improvement [ 4 ]
          Component/s test [ 12311440 ]
          Component/s test [ 12312904 ]
          Hide
          Konstantin Boudnik added a comment -

          It isn't about my satisfaction. It's two bits return value which is done as boolean type. C language doesn't have it that's why they use integer instead.

          Now, about suspend resume process: you are right, it is generic. Which technically allow to suspend a daemon VM's process and never be able to resume it. But It seems to be Ok, I guess. I was totally confused by the fact that this has been tracked by a MAPREDUCE JIRA I'm moving this ticket out to HADOOP.

          Show
          Konstantin Boudnik added a comment - It isn't about my satisfaction. It's two bits return value which is done as boolean type. C language doesn't have it that's why they use integer instead. Now, about suspend resume process: you are right, it is generic. Which technically allow to suspend a daemon VM's process and never be able to resume it. But It seems to be Ok, I guess. I was totally confused by the fact that this has been tracked by a MAPREDUCE JIRA I'm moving this ticket out to HADOOP.
          Vinay Kumar Thota made changes -
          Attachment 1753-ydist-security.patch [ 12444883 ]
          Hide
          Vinay Kumar Thota added a comment -

          Now changed the return type to boolean as per you satisfaction...
          Actually I thought that whenever we execute some commands, it gives some error code like -1 for not succeed or 0 for succeed right.Same way I have implemented the method but I have provided 1 for not succeed instead of -1. Any how I have changed the methods return type boolean based on your comment and uploaded the latest patch.

          Show
          Vinay Kumar Thota added a comment - Now changed the return type to boolean as per you satisfaction... Actually I thought that whenever we execute some commands, it gives some error code like -1 for not succeed or 0 for succeed right.Same way I have implemented the method but I have provided 1 for not succeed instead of -1. Any how I have changed the methods return type boolean based on your comment and uploaded the latest patch.
          Hide
          Vinay Kumar Thota added a comment -

          This functionality implemented in a generic fashion. You can suspend or resume any process if you know the process id. It's not specific to only task processes. So that I have given the description in more generic way instead of mentioning task process. If you see, these two methods are implemented in DaemonProtocolAspect.aj file for utilizing the other purposes.Otherwise if it is task level, we can implement in TaskTrackerAspect.aj itself.

          Show
          Vinay Kumar Thota added a comment - This functionality implemented in a generic fashion. You can suspend or resume any process if you know the process id. It's not specific to only task processes. So that I have given the description in more generic way instead of mentioning task process. If you see, these two methods are implemented in DaemonProtocolAspect.aj file for utilizing the other purposes.Otherwise if it is task level, we can implement in TaskTrackerAspect.aj itself.
          Hide
          Konstantin Boudnik added a comment -

          Actually I am not suspending the daemonprocess. I am just suspending the task attempt process id that is running on

          I have edited the name of the JIRA. Please update the JavaDocs accordingly. It should clear state that the suspend is done for a task's process.

          Show
          Konstantin Boudnik added a comment - Actually I am not suspending the daemonprocess. I am just suspending the task attempt process id that is running on I have edited the name of the JIRA. Please update the JavaDocs accordingly. It should clear state that the suspend is done for a task's process.
          Konstantin Boudnik made changes -
          Summary Implement a functionality for suspend and resume the process. Implement a functionality for suspend and resume a task's process.
          Hide
          Konstantin Boudnik added a comment -

          I used the integer value because of exitcode format.Usually if you run a command and if it is not successful then return some exitcode right.I followed the same case here.

          You introducing a Java API which performs an operation which can succeed or not. In Java it is true or false. In C it is 0 and 1.

          Show
          Konstantin Boudnik added a comment - I used the integer value because of exitcode format.Usually if you run a command and if it is not successful then return some exitcode right.I followed the same case here. You introducing a Java API which performs an operation which can succeed or not. In Java it is true or false . In C it is 0 and 1 .
          Konstantin Boudnik made changes -
          Link This issue depends on MAPREDUCE-1774 [ MAPREDUCE-1774 ]
          Konstantin Boudnik made changes -
          Link This issue requires HADOOP-6332 [ HADOOP-6332 ]
          Vinay Kumar Thota made changes -
          Attachment 1753-ydist-security.patch [ 12444331 ]
          Hide
          Vinay Kumar Thota added a comment -
          • once a DaemonProcess in question is suspended (i.e. its VM is effectively suspended) will you be able to resume it via a method invocation in a class running on that VM? I haven't verified it but something tells me that it won't work.

          Actually I am not suspending the daemonprocess. I am just suspending the task attempt process id that is running on separate JVM. So, in this case we can suspend and resume the task attempt process without any issues. However, I have written those methods in more generic manner and keeping them in DameonCluster for utilizing other purposes.

          • instead of returning 0 and 1 (??) return boolean
            I used the integer value because of exitcode format.Usually if you run a command and if it is not successful then return some exitcode right.I followed the same case here.
          • property name test.system.task.resume.cmd isn't consistent with the rest of properties from system-test.xml. Also, the default has to be added to system-test.xml and documented properly.
          • same for @return not @ return
          • as usual, missing descriptions of @throws

          Done

          Uploaded the latest patch by addressing the given comments.

          Show
          Vinay Kumar Thota added a comment - once a DaemonProcess in question is suspended (i.e. its VM is effectively suspended) will you be able to resume it via a method invocation in a class running on that VM? I haven't verified it but something tells me that it won't work. Actually I am not suspending the daemonprocess. I am just suspending the task attempt process id that is running on separate JVM. So, in this case we can suspend and resume the task attempt process without any issues. However, I have written those methods in more generic manner and keeping them in DameonCluster for utilizing other purposes. instead of returning 0 and 1 (??) return boolean I used the integer value because of exitcode format.Usually if you run a command and if it is not successful then return some exitcode right.I followed the same case here. property name test.system.task.resume.cmd isn't consistent with the rest of properties from system-test.xml. Also, the default has to be added to system-test.xml and documented properly. same for @return not @ return as usual, missing descriptions of @throws Done Uploaded the latest patch by addressing the given comments.
          Hide
          Konstantin Boudnik added a comment -

          I have a couple of concerns about this approach:

          • once a DaemonProcess in question is suspended (i.e. its VM is effectively suspended) will you be able to resume it via a method invocation in a class running on that VM? I haven't verified it but something tells me that it won't work.

          If I am mistaken in that one then a few more:

          • property name test.system.task.resume.cmd isn't consistent with the rest of properties from system-test.xml. Also, the default has to be added to system-test.xml and documented properly.
          • instead of returning 0 and 1 (??) return boolean
          • has to be @throws not @ throws
          • same for @return not @ return
          • as usual, missing descriptions of @throws
          Show
          Konstantin Boudnik added a comment - I have a couple of concerns about this approach: once a DaemonProcess in question is suspended (i.e. its VM is effectively suspended) will you be able to resume it via a method invocation in a class running on that VM? I haven't verified it but something tells me that it won't work. If I am mistaken in that one then a few more: property name test.system.task.resume.cmd isn't consistent with the rest of properties from system-test.xml . Also, the default has to be added to system-test.xml and documented properly. instead of returning 0 and 1 (??) return boolean has to be @throws not @ throws same for @return not @ return as usual, missing descriptions of @throws
          Vinay Kumar Thota made changes -
          Link This issue is required by MAPREDUCE-1731 [ MAPREDUCE-1731 ]
          Vinay Kumar Thota made changes -
          Attachment 1753-ydist-security.patch [ 12444098 ]
          Hide
          Vinay Kumar Thota added a comment -

          Patch for Yahoo distribution security branch.

          Show
          Vinay Kumar Thota added a comment - Patch for Yahoo distribution security branch.
          Vinay Kumar Thota made changes -
          Attachment daemonprotocolaspect.patch [ 12443675 ]
          Vinay Kumar Thota made changes -
          Field Original Value New Value
          Link This issue requires HADOOP-6332 [ HADOOP-6332 ]
          Vinay Kumar Thota created issue -

            People

            • Assignee:
              Vinay Kumar Thota
              Reporter:
              Vinay Kumar Thota
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development