Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1167

Make ProcfsBasedProcessTree collect rss memory information

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Right now ProcfsBasedProcess collects only virtual memory. We can make it collect rss memory as well.
      Later we can use rss in TaskMemoryManagerThread to obtain better memory management.

      1. MAPREDUCE-1167.patch
        18 kB
        Scott Chen
      2. MAPREDUCE-1167-v2.patch
        20 kB
        Scott Chen
      3. MAPREDUCE-1167-v3.patch
        20 kB
        Scott Chen
      4. MAPREDUCE-1167-v4.patch
        19 kB
        Scott Chen
      5. MAPREDUCE-1167-v5.patch
        19 kB
        Scott Chen

        Issue Links

          Activity

          Scott Chen created issue -
          Hide
          Scott Chen added a comment -

          This patch makes ProcfsBasedProcessTree collect rss. The corresponding unit test is included.

          Show
          Scott Chen added a comment - This patch makes ProcfsBasedProcessTree collect rss. The corresponding unit test is included.
          Scott Chen made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-1167.patch [ 12423608 ]
          Scott Chen made changes -
          Link This issue relates to MAPREDUCE-961 [ MAPREDUCE-961 ]
          Hide
          Scott Chen added a comment -

          I added some minor change which makes cumulativeRssmem() returns rss memory in bytes rather than page size.

          Show
          Scott Chen added a comment - I added some minor change which makes cumulativeRssmem() returns rss memory in bytes rather than page size.
          Scott Chen made changes -
          Attachment MAPREDUCE-1167-v2.patch [ 12423865 ]
          Scott Chen made changes -
          Link This issue blocks MAPREDUCE-1181 [ MAPREDUCE-1181 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Quickly looked at your patch. Few questions:

          • Can you find out and put some details as to how standard is getconf command? Given ProcfsBasedProcessTree is for Linux only, we are only concerned about Linux distributions. I can find it on my Ubuntu dev box, but RHEL?
          • I think we should keep track of rssmem everywhere in terms of bytes. This would inclue ProcessInfo.rssMem;
          • The behaviour when PAGESIZE is -ve should be changed. The patch gives out negative rss sizes. Instead we should throw exceptions in ProcessInfo.getRss(), ProcfsBasedProcessTree.getCumulativeRssMem() etc.
          Show
          Vinod Kumar Vavilapalli added a comment - Quickly looked at your patch. Few questions: Can you find out and put some details as to how standard is getconf command? Given ProcfsBasedProcessTree is for Linux only, we are only concerned about Linux distributions. I can find it on my Ubuntu dev box, but RHEL? I think we should keep track of rssmem everywhere in terms of bytes. This would inclue ProcessInfo.rssMem; The behaviour when PAGESIZE is -ve should be changed. The patch gives out negative rss sizes. Instead we should throw exceptions in ProcessInfo.getRss() , ProcfsBasedProcessTree.getCumulativeRssMem() etc.
          Hide
          Scott Chen added a comment -

          Thanks, Vinod.
          1. I will find out what is the standard way to obtain PAGESIZE.
          2. I will use bytes everywhere for rss. I agree it is more clear that way.
          3. Throwing exception in getCumulativeRssMem() is also a good suggestion. I will follow this one as well.

          Show
          Scott Chen added a comment - Thanks, Vinod. 1. I will find out what is the standard way to obtain PAGESIZE. 2. I will use bytes everywhere for rss. I agree it is more clear that way. 3. Throwing exception in getCumulativeRssMem() is also a good suggestion. I will follow this one as well.
          Scott Chen made changes -
          Link This issue blocks MAPREDUCE-1195 [ MAPREDUCE-1195 ]
          Scott Chen made changes -
          Link This issue is part of MAPREDUCE-220 [ MAPREDUCE-220 ]
          Scott Chen made changes -
          Assignee Scott Chen [ schen ]
          Hide
          Scott Chen added a comment -

          I did some survey. getconf is defined in POSIX.
          http://linux.die.net/man/1/getconf (in the bottom)
          So it should be supported on different Linux versions.

          Show
          Scott Chen added a comment - I did some survey. getconf is defined in POSIX. http://linux.die.net/man/1/getconf (in the bottom) So it should be supported on different Linux versions.
          Hide
          Scott Chen added a comment -

          1. Rename rssmem to rssmemPage to help clarify.
          2. Throws Exception when PAGESIZE is not available.

          @Vinod: After reviewing the code, I think it is better to use page number in ProcessInfo because ProcessInfo is simply a parsed version of /proc/PID directory. It should be consistent with what's in /proc. I changed the field's name from rssmem to rssmemPage and also the getters' names. I think this should be able to help clarify. Also ProcessInfo is a private class. I think as long as our public methods all use bytes it should be fine.

          Show
          Scott Chen added a comment - 1. Rename rssmem to rssmemPage to help clarify. 2. Throws Exception when PAGESIZE is not available. @Vinod: After reviewing the code, I think it is better to use page number in ProcessInfo because ProcessInfo is simply a parsed version of /proc/PID directory. It should be consistent with what's in /proc. I changed the field's name from rssmem to rssmemPage and also the getters' names. I think this should be able to help clarify. Also ProcessInfo is a private class. I think as long as our public methods all use bytes it should be fine.
          Scott Chen made changes -
          Attachment MAPREDUCE-1167-v3.patch [ 12424423 ]
          Scott Chen made changes -
          Attachment MAPREDUCE-1167-v3.patch [ 12424423 ]
          Scott Chen made changes -
          Attachment MAPREDUCE-1167-v3.patch [ 12424424 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Scott Chen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Scott Chen added a comment -

          1. Change TestProcfsBasedProcessTree.ProcessStatInfo so that it will not affect TestTaskTrackerMemoryManager.
          2. Remove the change in TestTaskTrackerMemoryManager
          3. I make getCumulativeRssmem return 0 if PAGE_SIZE is not available

          The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent.
          When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception.
          It is good to make them follow the same behavior.
          And these situations should not happen if the system is linux.

          Show
          Scott Chen added a comment - 1. Change TestProcfsBasedProcessTree.ProcessStatInfo so that it will not affect TestTaskTrackerMemoryManager. 2. Remove the change in TestTaskTrackerMemoryManager 3. I make getCumulativeRssmem return 0 if PAGE_SIZE is not available The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent. When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception. It is good to make them follow the same behavior. And these situations should not happen if the system is linux.
          Scott Chen made changes -
          Attachment MAPREDUCE-1167-v4.patch [ 12424681 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.22.0 [ 12314184 ]
          Affects Version/s 0.20.1 [ 12314047 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424681/MAPREDUCE-1167-v4.patch
          against trunk revision 834284.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424681/MAPREDUCE-1167-v4.patch against trunk revision 834284. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/136/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The patch looks OK. Only minor comments:

          • ProcessTreeDump is printing Rss size in pages but the header reads RSSMEM_USAGE(BYTES). This should be fixed.
          • In TestProcfsBasedProcessTree, Long.ParseLong(String) is used in many places. This is costly, you can use the long value directly if long type is needed, otherwise Long.valueOf(long) if Long type is. For example, see +408 after applying your patch.
          • TestProcfsBasedProcessTree failed. You need to modify the pattern at TestProcfsBasedProcessTree.java +188.

          The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent. When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception.

          I think this was wrongly done and should be changed. Will file a new issue. We can keep whatever you've done for now.

          Show
          Vinod Kumar Vavilapalli added a comment - The patch looks OK. Only minor comments: ProcessTreeDump is printing Rss size in pages but the header reads RSSMEM_USAGE(BYTES). This should be fixed. In TestProcfsBasedProcessTree, Long.ParseLong(String) is used in many places. This is costly, you can use the long value directly if long type is needed, otherwise Long.valueOf(long) if Long type is. For example, see +408 after applying your patch. TestProcfsBasedProcessTree failed. You need to modify the pattern at TestProcfsBasedProcessTree.java +188. The reason for 3 is because getCumulativeVmem and getCumulativeRssmem should be consistent. When /proc/ is not available, getCumulativeVmem will return 0 instead of throwing Exception. I think this was wrongly done and should be changed. Will file a new issue. We can keep whatever you've done for now.
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vinod Kumar Vavilapalli made changes -
          Link This issue blocks MAPREDUCE-1201 [ MAPREDUCE-1201 ]
          Hide
          Scott Chen added a comment -

          Thanks, Vinod.

          I have fixed the issues. I tested this on my mac so it did not go through testProcessTree() on my machine. I have tested it on a linux dev box this time. I have also globally replaced all parseLong() in TestProcfsBasedProcessTree.

          I agree with you on the return 0 behavior. It should be filed in another issue.

          Show
          Scott Chen added a comment - Thanks, Vinod. I have fixed the issues. I tested this on my mac so it did not go through testProcessTree() on my machine. I have tested it on a linux dev box this time. I have also globally replaced all parseLong() in TestProcfsBasedProcessTree. I agree with you on the return 0 behavior. It should be filed in another issue.
          Scott Chen made changes -
          Attachment MAPREDUCE-1167-v5.patch [ 12424758 ]
          Scott Chen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424758/MAPREDUCE-1167-v5.patch
          against trunk revision 835237.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424758/MAPREDUCE-1167-v5.patch against trunk revision 835237. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/239/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The contrib test failure is tracked at MAPREDUCE-1124.

          +1 for the latest patch.

          This is good to go. Can you ask someone to commit this?

          Show
          Vinod Kumar Vavilapalli added a comment - The contrib test failure is tracked at MAPREDUCE-1124 . +1 for the latest patch. This is good to go. Can you ask someone to commit this?
          Hide
          Scott Chen added a comment -

          Thanks for all the help, Vinod.

          I will ask Dhruba to see if he can commit this one.

          Show
          Scott Chen added a comment - Thanks for all the help, Vinod. I will ask Dhruba to see if he can commit this one.
          Hide
          dhruba borthakur added a comment -

          I will commit this in a short while.

          Show
          dhruba borthakur added a comment - I will commit this in a short while.
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Scott!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Scott!
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Scott!

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Scott!
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #121 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/121/)
          . ProcfsBasedProcessTree collects rss memory information.
          (Scott Chen via dhruba)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #121 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/121/ ) . ProcfsBasedProcessTree collects rss memory information. (Scott Chen via dhruba)
          Hide
          Scott Chen added a comment -

          @Vinod, Could you help me review MAPREDUCE-1201? It is quite similar to this one.
          I think you should be able to give a good review. Thanks.

          Show
          Scott Chen added a comment - @Vinod, Could you help me review MAPREDUCE-1201 ? It is quite similar to this one. I think you should be able to give a good review. Thanks.
          Tom White made changes -
          Fix Version/s 0.21.0 [ 12314045 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue blocks MAPREDUCE-1181 [ MAPREDUCE-1181 ]
          Gavin made changes -
          Link This issue is depended upon by MAPREDUCE-1181 [ MAPREDUCE-1181 ]
          Gavin made changes -
          Link This issue blocks MAPREDUCE-1195 [ MAPREDUCE-1195 ]
          Gavin made changes -
          Link This issue is depended upon by MAPREDUCE-1195 [ MAPREDUCE-1195 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          6h 21m 2 Vinod Kumar Vavilapalli 12/Nov/09 06:46
          Open Open Patch Available Patch Available
          13d 17h 53m 3 Scott Chen 12/Nov/09 19:58
          Patch Available Patch Available Resolved Resolved
          5d 26m 1 dhruba borthakur 17/Nov/09 20:25
          Resolved Resolved Closed Closed
          280d 53m 1 Tom White 24/Aug/10 21:19

            People

            • Assignee:
              Scott Chen
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development