Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1167

Make ProcfsBasedProcessTree collect rss memory information

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Right now ProcfsBasedProcess collects only virtual memory. We can make it collect rss memory as well.
      Later we can use rss in TaskMemoryManagerThread to obtain better memory management.

      1. MAPREDUCE-1167.patch
        18 kB
        Scott Chen
      2. MAPREDUCE-1167-v2.patch
        20 kB
        Scott Chen
      3. MAPREDUCE-1167-v3.patch
        20 kB
        Scott Chen
      4. MAPREDUCE-1167-v4.patch
        19 kB
        Scott Chen
      5. MAPREDUCE-1167-v5.patch
        19 kB
        Scott Chen

        Issue Links

          Activity

          Hide
          Scott Chen added a comment -

          This patch makes ProcfsBasedProcessTree collect rss. The corresponding unit test is included.

          Show
          Scott Chen added a comment - This patch makes ProcfsBasedProcessTree collect rss. The corresponding unit test is included.
          Hide
          Scott Chen added a comment -

          I added some minor change which makes cumulativeRssmem() returns rss memory in bytes rather than page size.

          Show
          Scott Chen added a comment - I added some minor change which makes cumulativeRssmem() returns rss memory in bytes rather than page size.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Quickly looked at your patch. Few questions:

          • Can you find out and put some details as to how standard is getconf command? Given ProcfsBasedProcessTree is for Linux only, we are only concerned about Linux distributions. I can find it on my Ubuntu dev box, but RHEL?
          • I think we should keep track of rssmem everywhere in terms of bytes. This would inclue ProcessInfo.rssMem;
          • The behaviour when PAGESIZE is -ve should be changed. The patch gives out negative rss sizes. Instead we should throw exceptions in ProcessInfo.getRss(), ProcfsBasedProcessTree.getCumulativeRssMem() etc.
          Show
          Vinod Kumar Vavilapalli added a comment - Quickly looked at your patch. Few questions: Can you find out and put some details as to how standard is getconf command? Given ProcfsBasedProcessTree is for Linux only, we are only concerned about Linux distributions. I can find it on my Ubuntu dev box, but RHEL? I think we should keep track of rssmem everywhere in terms of bytes. This would inclue ProcessInfo.rssMem; The behaviour when PAGESIZE is -ve should be changed. The patch gives out negative rss sizes. Instead we should throw exceptions in ProcessInfo.getRss() , ProcfsBasedProcessTree.getCumulativeRssMem() etc.
          Hide
          Scott Chen added a comment -

          Thanks, Vinod.
          1. I will find out what is the standard way to obtain PAGESIZE.
          2. I will use bytes everywhere for rss. I agree it is more clear that way.
          3. Throwing exception in getCumulativeRssMem() is also a good suggestion. I will follow this one as well.

          Show
          Scott Chen added a comment - Thanks, Vinod. 1. I will find out what is the standard way to obtain PAGESIZE. 2. I will use bytes everywhere for rss. I agree it is more clear that way. 3. Throwing exception in getCumulativeRssMem() is also a good suggestion. I will follow this one as well.
          Hide
          Scott Chen added a comment -

          I did some survey. getconf is defined in POSIX.
          http://linux.die.net/man/1/getconf (in the bottom)
          So it should be supported on different Linux versions.

          Show
          Scott Chen added a comment - I did some survey. getconf is defined in POSIX. http://linux.die.net/man/1/getconf (in the bottom) So it should be supported on different Linux versions.
          Hide
          Scott Chen added a comment -

          1. Rename rssmem to rssmemPage to help clarify.
          2. Throws Exception when PAGESIZE is not available.

          @Vinod: After reviewing the code, I think it is better to use page number in ProcessInfo because ProcessInfo is simply a parsed version of /proc/PID directory. It should be consistent with what's in /proc. I changed the field's name from rssmem to rssmemPage and also the getters' names. I think this should be able to help clarify. Also ProcessInfo is a private class. I think as long as our public methods all use bytes it should be fine.

          Show
          Scott Chen added a comment - 1. Rename rssmem to rssmemPage to help clarify. 2. Throws Exception when PAGESIZE is not available. @Vinod: After reviewing the code, I think it is better to use page number in ProcessInfo because ProcessInfo is simply a parsed version of /proc/PID directory. It should be consistent with what's in /proc. I changed the field's name from rssmem to rssmemPage and also the getters' names. I think this should be able to help clarify. Also ProcessInfo is a private class. I think as long as our public methods all use bytes it should be fine.
          Hide
          Scott Chen added a comment -

          1. Change TestProcfsBasedProcessTree.ProcessStatInfo so that it will not affect TestTaskTrackerMemoryManager.
          2. Remove the change in TestTaskTrackerMemoryManager
          3. I make getCumulativeRssmem return 0 if PAGE_SIZE is not available

          The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent.
          When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception.
          It is good to make them follow the same behavior.
          And these situations should not happen if the system is linux.

          Show
          Scott Chen added a comment - 1. Change TestProcfsBasedProcessTree.ProcessStatInfo so that it will not affect TestTaskTrackerMemoryManager. 2. Remove the change in TestTaskTrackerMemoryManager 3. I make getCumulativeRssmem return 0 if PAGE_SIZE is not available The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent. When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception. It is good to make them follow the same behavior. And these situations should not happen if the system is linux.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424681/MAPREDUCE-1167-v4.patch
          against trunk revision 834284.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424681/MAPREDUCE-1167-v4.patch against trunk revision 834284. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The patch looks OK. Only minor comments:

          • ProcessTreeDump is printing Rss size in pages but the header reads RSSMEM_USAGE(BYTES). This should be fixed.
          • In TestProcfsBasedProcessTree, Long.ParseLong(String) is used in many places. This is costly, you can use the long value directly if long type is needed, otherwise Long.valueOf(long) if Long type is. For example, see +408 after applying your patch.
          • TestProcfsBasedProcessTree failed. You need to modify the pattern at TestProcfsBasedProcessTree.java +188.

          The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent. When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception.

          I think this was wrongly done and should be changed. Will file a new issue. We can keep whatever you've done for now.

          Show
          Vinod Kumar Vavilapalli added a comment - The patch looks OK. Only minor comments: ProcessTreeDump is printing Rss size in pages but the header reads RSSMEM_USAGE(BYTES). This should be fixed. In TestProcfsBasedProcessTree, Long.ParseLong(String) is used in many places. This is costly, you can use the long value directly if long type is needed, otherwise Long.valueOf(long) if Long type is. For example, see +408 after applying your patch. TestProcfsBasedProcessTree failed. You need to modify the pattern at TestProcfsBasedProcessTree.java +188. The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent. When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception. I think this was wrongly done and should be changed. Will file a new issue. We can keep whatever you've done for now.
          Hide
          Scott Chen added a comment -

          Thanks, Vinod.

          I have fixed the issues. I tested this on my mac so it did not go through testProcessTree() on my machine. I have tested it on a linux dev box this time. I have also globally replaced all parseLong() in TestProcfsBasedProcessTree.

          I agree with you on the return 0 behavior. It should be filed in another issue.

          Show
          Scott Chen added a comment - Thanks, Vinod. I have fixed the issues. I tested this on my mac so it did not go through testProcessTree() on my machine. I have tested it on a linux dev box this time. I have also globally replaced all parseLong() in TestProcfsBasedProcessTree. I agree with you on the return 0 behavior. It should be filed in another issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424758/MAPREDUCE-1167-v5.patch
          against trunk revision 835237.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424758/MAPREDUCE-1167-v5.patch against trunk revision 835237. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The contrib test failure is tracked at MAPREDUCE-1124.

          +1 for the latest patch.

          This is good to go. Can you ask someone to commit this?

          Show
          Vinod Kumar Vavilapalli added a comment - The contrib test failure is tracked at MAPREDUCE-1124 . +1 for the latest patch. This is good to go. Can you ask someone to commit this?
          Hide
          Scott Chen added a comment -

          Thanks for all the help, Vinod.

          I will ask Dhruba to see if he can commit this one.

          Show
          Scott Chen added a comment - Thanks for all the help, Vinod. I will ask Dhruba to see if he can commit this one.
          Hide
          dhruba borthakur added a comment -

          I will commit this in a short while.

          Show
          dhruba borthakur added a comment - I will commit this in a short while.
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Scott!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Scott!
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Scott!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Scott!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #121 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/121/)
          . ProcfsBasedProcessTree collects rss memory information.
          (Scott Chen via dhruba)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #121 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/121/ ) . ProcfsBasedProcessTree collects rss memory information. (Scott Chen via dhruba)
          Hide
          Scott Chen added a comment -

          @Vinod, Could you help me review MAPREDUCE-1201? It is quite similar to this one.
          I think you should be able to give a good review. Thanks.

          Show
          Scott Chen added a comment - @Vinod, Could you help me review MAPREDUCE-1201 ? It is quite similar to this one. I think you should be able to give a good review. Thanks.

            People

            • Assignee:
              Scott Chen
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development