Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1048

Show total slot usage in cluster summary on jobtracker webui

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.21.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added occupied map/reduce slots and reserved map/reduce slots to the "Cluster Summary" table on jobtracker web ui.

      Description

      With High-Ram jobs coming into the picture, its important to also show the slot usage in cluster summary since total-running-maps < total-slots-occupied.

      1. mapred-1048-v1.0.patch
        2 kB
        Amar Kamat
      2. mapred-1048-v1.1.patch
        2 kB
        Amar Kamat
      3. patch-1048.txt
        14 kB
        Amareshwari Sriramadasu
      4. patch-1048-1.txt
        16 kB
        Amareshwari Sriramadasu
      5. patch-1048-2.txt
        20 kB
        Amareshwari Sriramadasu
      6. patch-1048-3.txt
        26 kB
        Amareshwari Sriramadasu
      7. patch-1048-0.20.txt
        14 kB
        Amareshwari Sriramadasu
      8. patch-1048-4.txt
        31 kB
        Amareshwari Sriramadasu
      9. patch-1048-5.txt
        31 kB
        Amareshwari Sriramadasu
      10. patch-1048-6.txt
        31 kB
        Amareshwari Sriramadasu
      11. patch-1048-ydist.txt
        22 kB
        Amareshwari Sriramadasu
      12. MAPREDUCE-1048.patch
        32 kB
        Hemanth Yamijala
      13. MAPREDUCE-1048-20.patch
        28 kB
        Hemanth Yamijala

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #127 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/127/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #127 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/127/ )
        Hide
        Amareshwari Sriramadasu added a comment -

        +1 Y!20 patch looks fine.

        Show
        Amareshwari Sriramadasu added a comment - +1 Y!20 patch looks fine.
        Hide
        Hemanth Yamijala added a comment -

        Attached patch contains the same changes I had suggested for the trunk patch for the Hadoop 0.20 branch (not to be committed).

        test-patch passed. Running tests locally.

        Show
        Hemanth Yamijala added a comment - Attached patch contains the same changes I had suggested for the trunk patch for the Hadoop 0.20 branch (not to be committed). test-patch passed. Running tests locally.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #97 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/97/)
        . Add occupied/reserved slot usage summary on jobtracker UI. Contributed by Amareshwari Sriramadasu and Hemanth Yamijala.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #97 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/97/ ) . Add occupied/reserved slot usage summary on jobtracker UI. Contributed by Amareshwari Sriramadasu and Hemanth Yamijala.
        Hide
        Sharad Agarwal added a comment -

        I committed this. Thanks Amareshwari and Hemanth.

        Show
        Sharad Agarwal added a comment - I committed this. Thanks Amareshwari and Hemanth.
        Hide
        Amareshwari Sriramadasu added a comment -

        -1 contrib tests is due to MAPREDUCE-1124

        Show
        Amareshwari Sriramadasu added a comment - -1 contrib tests is due to MAPREDUCE-1124
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12423013/MAPREDUCE-1048.patch
        against trunk revision 828979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423013/MAPREDUCE-1048.patch against trunk revision 828979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/90/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        +1 Changes look fine to me

        Show
        Amareshwari Sriramadasu added a comment - +1 Changes look fine to me
        Hide
        Hemanth Yamijala added a comment -

        Passing through Hudson.

        Show
        Hemanth Yamijala added a comment - Passing through Hudson.
        Hide
        Hemanth Yamijala added a comment -

        Attaching patch that incorporates the review comments.

        Show
        Hemanth Yamijala added a comment - Attaching patch that incorporates the review comments.
        Hide
        Hemanth Yamijala added a comment -

        Canceling patch to fix the review comments.

        Show
        Hemanth Yamijala added a comment - Canceling patch to fix the review comments.
        Hide
        Hemanth Yamijala added a comment -

        Another issue is about the jobtracker.jspx which now is (almost) a copy of jobtracker.jsp. It seems a little odd that we must update the pages independently to keep them in sync. Also, since jobtracker.jspx returns data in an XML format, I am not sure if it is intended to be used as an interface. In that case, things could break if changes are made to it. I am thinking we should update the jobtracker.jspx in a separate JIRA after confirming that users who are using it actually are OK with the change. Also in that JIRA we could figure ways of avoiding having to make changes to both jsp pages by sharing code.

        Show
        Hemanth Yamijala added a comment - Another issue is about the jobtracker.jspx which now is (almost) a copy of jobtracker.jsp. It seems a little odd that we must update the pages independently to keep them in sync. Also, since jobtracker.jspx returns data in an XML format, I am not sure if it is intended to be used as an interface. In that case, things could break if changes are made to it. I am thinking we should update the jobtracker.jspx in a separate JIRA after confirming that users who are using it actually are OK with the change. Also in that JIRA we could figure ways of avoiding having to make changes to both jsp pages by sharing code.
        Hide
        Hemanth Yamijala added a comment -

        This looks fine. I have a few minor comments:

        • It is confusing that incrementReservations can decrement also. I think its better to split the calls into two.
        • ClusterMetrics javadoc needs to be updated with total number of job submissions.
        • In the javadoc of ClientProtocol for version 29, please also include total job submissions. I think we also try and include the JIRA which made the change in the comment.
        • TestClusterStatus can also have a check with a TT coming back twice, so that we can cover that the oldStatus is also used to decrement old slot counts correctly.
        • Similarly, we can also have a check with re-reservation of slots
        Show
        Hemanth Yamijala added a comment - This looks fine. I have a few minor comments: It is confusing that incrementReservations can decrement also. I think its better to split the calls into two. ClusterMetrics javadoc needs to be updated with total number of job submissions. In the javadoc of ClientProtocol for version 29, please also include total job submissions. I think we also try and include the JIRA which made the change in the comment. TestClusterStatus can also have a check with a TT coming back twice, so that we can cover that the oldStatus is also used to decrement old slot counts correctly. Similarly, we can also have a check with re-reservation of slots
        Hide
        Hemanth Yamijala added a comment -

        In offline discussions with Arun and Eric, we decided to stick to my last proposal of showing the actual number for occupied slots - not including the virtual reserved slots in the count.

        Show
        Hemanth Yamijala added a comment - In offline discussions with Arun and Eric, we decided to stick to my last proposal of showing the actual number for occupied slots - not including the virtual reserved slots in the count.
        Hide
        Hemanth Yamijala added a comment -

        There are two possible numbers that we can show for the occupied slots. It could be the number of slots running tasks, or it could be the number of slots running tasks + the number of virtual reserved slots (virtual reserved slots for a job = number of trackers with reservations for the job * number of slots per task for the job). Showing the actual number of slots running tasks gives a more correct view, but the second number would be what is seen by the scheduler as the capacity consumed. I think it may be good to show the first number only - it is more intuitive and more correct. Thoughts ?

        Show
        Hemanth Yamijala added a comment - There are two possible numbers that we can show for the occupied slots. It could be the number of slots running tasks, or it could be the number of slots running tasks + the number of virtual reserved slots (virtual reserved slots for a job = number of trackers with reservations for the job * number of slots per task for the job). Showing the actual number of slots running tasks gives a more correct view, but the second number would be what is seen by the scheduler as the capacity consumed. I think it may be good to show the first number only - it is more intuitive and more correct. Thoughts ?
        Hide
        Iyappan Srinivasan added a comment -

        +1 from QA on reservation.

        Ran some reservation specific jobs:

        1) In a 8 node cluster, Run a normal job which takes up 7 slots. Then, run a high RAM job, which takes up 3 slots for 1 map. This will cause this high RAM job to reserve the extra 1 slot and wait, since it needs the other 2 slots to start running. At this point

        a) Kill a Task tracker which has that 1 reserved slot and make it lost. The reserved slot should dissapear.
        b) Kill task tracker and start it again. It should again get that 1 reservation.
        c) Blacklist that tasktracker. The reservation status should reflect accordingly. Bring back the node to healthy state.The reservation status should reflect accordingly
        d) Decomission that tasktracker. The reservation status should reflect accordingly. Bring back the node to healthy state.The reservation status should reflect accordingly.

        Show
        Iyappan Srinivasan added a comment - +1 from QA on reservation. Ran some reservation specific jobs: 1) In a 8 node cluster, Run a normal job which takes up 7 slots. Then, run a high RAM job, which takes up 3 slots for 1 map. This will cause this high RAM job to reserve the extra 1 slot and wait, since it needs the other 2 slots to start running. At this point a) Kill a Task tracker which has that 1 reserved slot and make it lost. The reserved slot should dissapear. b) Kill task tracker and start it again. It should again get that 1 reservation. c) Blacklist that tasktracker. The reservation status should reflect accordingly. Bring back the node to healthy state.The reservation status should reflect accordingly d) Decomission that tasktracker. The reservation status should reflect accordingly. Bring back the node to healthy state.The reservation status should reflect accordingly.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch for Yahoo distribution of branch 0.20

        Show
        Amareshwari Sriramadasu added a comment - Patch for Yahoo distribution of branch 0.20
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422656/patch-1048-6.txt
        against trunk revision 826767.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422656/patch-1048-6.txt against trunk revision 826767. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/191/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch with comments incorporated.

        Show
        Amareshwari Sriramadasu added a comment - Patch with comments incorporated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422547/patch-1048-5.txt
        against trunk revision 826767.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422547/patch-1048-5.txt against trunk revision 826767. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/190/console This message is automatically generated.
        Hide
        Sharad Agarwal added a comment -

        Changes look good to me. Minor comments:
        Would be better if we rename JobTracker#updateReservations to JobTracker#incrReservations
        In the jsp, all slot info should be together. We can move occupied/reserved slot info after Nodes column

        Show
        Sharad Agarwal added a comment - Changes look good to me. Minor comments: Would be better if we rename JobTracker#updateReservations to JobTracker#incrReservations In the jsp, all slot info should be together. We can move occupied/reserved slot info after Nodes column
        Hide
        Amareshwari Sriramadasu added a comment -

        re-submitting for hudson

        Show
        Amareshwari Sriramadasu added a comment - re-submitting for hudson
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #117 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/117/)
        . (Revert) Add occupied/reserved slot usage summary on jobtracker UI.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #117 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/117/ ) . (Revert) Add occupied/reserved slot usage summary on jobtracker UI.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The changes on org.apache.hadoop.examples.pi.DistSum look good to me. Thanks, Amareshwari!

        Show
        Tsz Wo Nicholas Sze added a comment - The changes on org.apache.hadoop.examples.pi.DistSum look good to me. Thanks, Amareshwari!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #85 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/85/)
        . (Revert) Add occupied/reserved slot usage summary on jobtracker UI.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #85 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/85/ ) . (Revert) Add occupied/reserved slot usage summary on jobtracker UI.
        Hide
        Amareshwari Sriramadasu added a comment -

        Removed an un-necessary close from the testcase.

        Show
        Amareshwari Sriramadasu added a comment - Removed an un-necessary close from the testcase.
        Hide
        Amareshwari Sriramadasu added a comment -

        Changed reservedSlots updates to happen in JobInProgress, in methods reserveTracker and unreserveTracker.
        Added running map/tasks and total job submissions to ClusterMetrics and modified jobtracke.jsp to create Cluster summary table from ClusterMetrics alone.

        Show
        Amareshwari Sriramadasu added a comment - Changed reservedSlots updates to happen in JobInProgress, in methods reserveTracker and unreserveTracker. Added running map/tasks and total job submissions to ClusterMetrics and modified jobtracke.jsp to create Cluster summary table from ClusterMetrics alone.
        Hide
        Sharad Agarwal added a comment -

        Some of the issues are applicable for trunk patch as well. Reopening the issue.

        Show
        Sharad Agarwal added a comment - Some of the issues are applicable for trunk patch as well. Reopening the issue.
        Hide
        Hemanth Yamijala added a comment -

        I am seeing some issues with the 20 patch:

        • The slot information is being accessed in an unsynchronized manner from the UI.
        • There is non-atomic access of this information. IOW, the map slots and reduce slots are being read from the UI in different calls, and a heartbeat could update them in between.

        Note that for the above two points, the cluster status model actually works correctly, because the cluster status is being read from the UI synchronized on the JobTracker and also a snapshot of the values is captured in a new ClusterStatus object when the UI reads it.

        • Reservation tracking seems broken in many ways:
          • removeTrackerReservations is being called in lostTaskTracker after reservations are cancelled. So information that needs to be removed is cleared already.
          • It seems like ExpireLaunchingTasks can result in a tracker being globally blacklisted. But I don't see any add/removeTrackerReservations in this place.
          • In processHeartbeat, there seem to be code paths where removeTrackerReservations is being called twice. For e.g. when the tracker is decided as lost.
          • In general, it is very, very hard to verify the correctness of this patch in the current form as the logic is spread out in multiple code paths, and it is difficult to verify if all the code paths are being covered.
        Show
        Hemanth Yamijala added a comment - I am seeing some issues with the 20 patch: The slot information is being accessed in an unsynchronized manner from the UI. There is non-atomic access of this information. IOW, the map slots and reduce slots are being read from the UI in different calls, and a heartbeat could update them in between. Note that for the above two points, the cluster status model actually works correctly, because the cluster status is being read from the UI synchronized on the JobTracker and also a snapshot of the values is captured in a new ClusterStatus object when the UI reads it. Reservation tracking seems broken in many ways: removeTrackerReservations is being called in lostTaskTracker after reservations are cancelled. So information that needs to be removed is cleared already. It seems like ExpireLaunchingTasks can result in a tracker being globally blacklisted. But I don't see any add/removeTrackerReservations in this place. In processHeartbeat, there seem to be code paths where removeTrackerReservations is being called twice. For e.g. when the tracker is decided as lost. In general, it is very, very hard to verify the correctness of this patch in the current form as the logic is spread out in multiple code paths, and it is difficult to verify if all the code paths are being covered.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #113 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/113/)
        . Add occupied/reserved slot usage summary on jobtracker UI. Contributed by Amareshwari Sriramadasu.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #113 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/113/ ) . Add occupied/reserved slot usage summary on jobtracker UI. Contributed by Amareshwari Sriramadasu.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #77 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/77/)
        . Add occupied/reserved slot usage summary on jobtracker UI. Contributed by Amareshwari Sriramadasu.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #77 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/77/ ) . Add occupied/reserved slot usage summary on jobtracker UI. Contributed by Amareshwari Sriramadasu.
        Hide
        Sharad Agarwal added a comment -

        Changing to Improvement since this jira is making changes to add slot summary on Jobtracker UI.

        Show
        Sharad Agarwal added a comment - Changing to Improvement since this jira is making changes to add slot summary on Jobtracker UI.
        Hide
        Sharad Agarwal added a comment -

        I just committed this. Thanks Amareshwari!

        Show
        Sharad Agarwal added a comment - I just committed this. Thanks Amareshwari!
        Hide
        Iyappan Srinivasan added a comment -

        +1 for QA

        patch-1048-0.20.txt

        Test scenarios covered:

        1) Ran normal, sleep, randomwriter jobs and verified if the slots like running map tasks, Running reduce tasks, occupied map slots, occupied Reduce Slots, Reserved map slots, reserved Reduce Slots.

        2) Ran high RAM jobs with more Map slots and more reuce slots in various jobs and made sure that the slots reflect the numbers correctly from various task trackers.

        3) In Task tracker restart, Lost Task tracker scenarios, the slots statistics reflect accordingly.

        4) In JT restart scenario, job is resubmitted and slot values reflected accordingly.

        Show
        Iyappan Srinivasan added a comment - +1 for QA patch-1048-0.20.txt Test scenarios covered: 1) Ran normal, sleep, randomwriter jobs and verified if the slots like running map tasks, Running reduce tasks, occupied map slots, occupied Reduce Slots, Reserved map slots, reserved Reduce Slots. 2) Ran high RAM jobs with more Map slots and more reuce slots in various jobs and made sure that the slots reflect the numbers correctly from various task trackers. 3) In Task tracker restart, Lost Task tracker scenarios, the slots statistics reflect accordingly. 4) In JT restart scenario, job is resubmitted and slot values reflected accordingly.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch for Yahoo hadoop branch 0.20 distributaion.

        Show
        Amareshwari Sriramadasu added a comment - Patch for Yahoo hadoop branch 0.20 distributaion.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422178/patch-1048-3.txt
        against trunk revision 825083.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422178/patch-1048-3.txt against trunk revision 825083. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/171/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch with review comments incorporated

        Show
        Amareshwari Sriramadasu added a comment - Patch with review comments incorporated
        Hide
        Sharad Agarwal added a comment -

        Overall looks fine. Few minor comments:
        move methods JobTracker#getReserved

        {Map/Reduce}

        Slots to TaskTracker
        Add reserve slots to ClusterMetrics
        In heartbeat, there may not be increment in the reserved slots if after the decrement the call gets short circuited. Would be better if we update the reserved slots at more granular level in processHeartbeat and around assignTasks call.

        Show
        Sharad Agarwal added a comment - Overall looks fine. Few minor comments: move methods JobTracker#getReserved {Map/Reduce} Slots to TaskTracker Add reserve slots to ClusterMetrics In heartbeat, there may not be increment in the reserved slots if after the decrement the call gets short circuited. Would be better if we update the reserved slots at more granular level in processHeartbeat and around assignTasks call.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421976/patch-1048-2.txt
        against trunk revision 824750.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421976/patch-1048-2.txt against trunk revision 824750. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/71/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch reverts the changes in ClusterStatus. Fixes ClusterMetrics.getOccupied

        {Map/Reduce}

        Slots to return slots.
        Adds occupied slots and reserved slots to the ui.

        Show
        Amareshwari Sriramadasu added a comment - Patch reverts the changes in ClusterStatus. Fixes ClusterMetrics.getOccupied {Map/Reduce} Slots to return slots. Adds occupied slots and reserved slots to the ui.
        Hide
        Amareshwari Sriramadasu added a comment -

        -1 core tests. Is known Issue (MAPREDUCE-1029)

        Show
        Amareshwari Sriramadasu added a comment - -1 core tests. Is known Issue ( MAPREDUCE-1029 )
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421711/patch-1048-1.txt
        against trunk revision 823227.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421711/patch-1048-1.txt against trunk revision 823227. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/154/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        OK, it isn't a regression code wise. But the names don't quite reflect the intention.

        javadoc is updated for the change.

        We should have separate ClusterStatus.get{Map|Reduce}Tasks()_ and _ClusterStatus.get{Map|Reduce}Slots() APIs.

        Cluster status need not know about tasks. If you see o.a.h.mapreduce.ClusterMetrics, the same methods are remaned. See my comments https://issues.apache.org/jira/browse/MAPREDUCE-1048?focusedCommentId=12763430&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12763430 and
        https://issues.apache.org/jira/browse/MAPREDUCE-1048?focusedCommentId=12763004&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12763004

        Show
        Amareshwari Sriramadasu added a comment - OK, it isn't a regression code wise. But the names don't quite reflect the intention. javadoc is updated for the change. We should have separate ClusterStatus.get{Map|Reduce}Tasks()_ and _ClusterStatus.get{Map|Reduce}Slots() APIs. Cluster status need not know about tasks. If you see o.a.h.mapreduce.ClusterMetrics, the same methods are remaned. See my comments https://issues.apache.org/jira/browse/MAPREDUCE-1048?focusedCommentId=12763430&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12763430 and https://issues.apache.org/jira/browse/MAPREDUCE-1048?focusedCommentId=12763004&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12763004
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Also changing it the way it is suggested is a regression.

        OK, it isn't a regression code wise. But the names don't quite reflect the intention.

        Show
        Vinod Kumar Vavilapalli added a comment - Also changing it the way it is suggested is a regression. OK, it isn't a regression code wise. But the names don't quite reflect the intention.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        ClusterStatus.getMapTasks/getReduceTasks will return number of slots occupied (not running tasks). Already ClusterStatus.getMaxMapTasks/getMaxReduceTasks return total number map/reduce slots in the cluster.

        I think we should distinguish between the two. We should have separate _ClusterStatus.get

        {Map|Reduce}Tasks()_ and _ClusterStatus.get{Map|Reduce}

        Slots()_ APIs.
        Also changing it the way it is suggested is a regression.

        Show
        Vinod Kumar Vavilapalli added a comment - ClusterStatus.getMapTasks/getReduceTasks will return number of slots occupied (not running tasks). Already ClusterStatus.getMaxMapTasks/getMaxReduceTasks return total number map/reduce slots in the cluster. I think we should distinguish between the two. We should have separate _ClusterStatus.get {Map|Reduce}Tasks()_ and _ClusterStatus.get{Map|Reduce} Slots()_ APIs. Also changing it the way it is suggested is a regression.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch modifies javadoc in ClusterStatus for getMaxMap/ReduceTasks also.

        Show
        Amareshwari Sriramadasu added a comment - Patch modifies javadoc in ClusterStatus for getMaxMap/ReduceTasks also.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421606/patch-1048.txt
        against trunk revision 819740.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421606/patch-1048.txt against trunk revision 819740. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/67/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        In 0.21, corresponding methods for ClusterStatus.getMapTasks/getReduceTasks are ClusterMetrics.getOccupiedMapSlots/getOccupiedReduceSlots and for ClusterStatus.getMaxMapTasks/getMaxReduceTasks are ClusterMetrics.getMapSlotCapacity/getReduceSlotCapacity.

        Show
        Amareshwari Sriramadasu added a comment - In 0.21, corresponding methods for ClusterStatus.getMapTasks/getReduceTasks are ClusterMetrics.getOccupiedMapSlots/getOccupiedReduceSlots and for ClusterStatus.getMaxMapTasks/getMaxReduceTasks are ClusterMetrics.getMapSlotCapacity/getReduceSlotCapacity.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch with proposed change

        Show
        Amareshwari Sriramadasu added a comment - Patch with proposed change
        Hide
        Amareshwari Sriramadasu added a comment -

        This looks like more a bug than feature. ClusterStatus.getMapTasks/getReduceTasks is called in existing code (in schedulers and examples etc.) to get the number of slots occupied.
        I would say cluster summary on JobTracker web ui can show slots occupied instead of running tasks.
        The statistics displayed by Amar would look like :

        Occupied Maps Slots Occupied Reduces Slots Total Submissions Nodes Map Slot Capacity Reduce Slot Capacity Avg. Tasks/Node Blacklisted Nodes Excluded Nodes
        30 76 55 38 228 76 8.00 0 0

        ClusterStatus.getMapTasks/getReduceTasks will return number of slots occupied (not running tasks). Already ClusterStatus.getMaxMapTasks/getMaxReduceTasks return total number map/reduce slots in the cluster.

        Thoughts?

        Show
        Amareshwari Sriramadasu added a comment - This looks like more a bug than feature. ClusterStatus.getMapTasks/getReduceTasks is called in existing code (in schedulers and examples etc.) to get the number of slots occupied. I would say cluster summary on JobTracker web ui can show slots occupied instead of running tasks. The statistics displayed by Amar would look like : Occupied Maps Slots Occupied Reduces Slots Total Submissions Nodes Map Slot Capacity Reduce Slot Capacity Avg. Tasks/Node Blacklisted Nodes Excluded Nodes 30 76 55 38 228 76 8.00 0 0 ClusterStatus.getMapTasks/getReduceTasks will return number of slots occupied (not running tasks). Already ClusterStatus.getMaxMapTasks/getMaxReduceTasks return total number map/reduce slots in the cluster. Thoughts?
        Hide
        Amar Kamat added a comment -

        One disconnect here is that the slots info is computed using the jobinprogress while cluster-summary is computed via task-tracker join-backs. Hence the is a small window where the data will be inconsistent. So another proposal is

        • Show slots info along with running jobs and show the total count in the end.
        • Show the slots info in a separate section.
        • Keep slots info in cluster summary but not to be shared with clients (i.e dont serialize it)
          Thoughts?
        Show
        Amar Kamat added a comment - One disconnect here is that the slots info is computed using the jobinprogress while cluster-summary is computed via task-tracker join-backs. Hence the is a small window where the data will be inconsistent. So another proposal is Show slots info along with running jobs and show the total count in the end. Show the slots info in a separate section. Keep slots info in cluster summary but not to be shared with clients (i.e dont serialize it) Thoughts?
        Hide
        Amar Kamat added a comment -

        Attaching a new patch.

        Show
        Amar Kamat added a comment - Attaching a new patch.
        Hide
        Amar Kamat added a comment -

        Attaching a patch for review. Very trivial patch. Testing in progress.

        Show
        Amar Kamat added a comment - Attaching a patch for review. Very trivial patch. Testing in progress.
        Hide
        Hemanth Yamijala added a comment -

        As discussed offline with Amar, a couple of points to consider:

        • Should we display as Amar defined above ? Or should we show separate columns ? Showing separate columns might make it appear more like an extension than a change to existing interface. Or a different way of asking the question is: is it OK to change the Web interface as shown above.
        • The second point is that other schedulers which don't support high RAM jobs will treat Tasks and Slots identically, so it could be redundant information for them. Is this OK ?
        Show
        Hemanth Yamijala added a comment - As discussed offline with Amar, a couple of points to consider: Should we display as Amar defined above ? Or should we show separate columns ? Showing separate columns might make it appear more like an extension than a change to existing interface. Or a different way of asking the question is: is it OK to change the Web interface as shown above. The second point is that other schedulers which don't support high RAM jobs will treat Tasks and Slots identically, so it could be redundant information for them. Is this OK ?
        Hide
        Amar Kamat added a comment -

        In the cluster summary section on the jobtracker webui, map tasks/slots information would make more sense. Example

        Maps Tasks/Maps Slots Reduces Tasks/Reduces Slots Total Submissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Excluded Nodes
        10/30 38/76 55 38 228 76 8.00 0 0
        Show
        Amar Kamat added a comment - In the cluster summary section on the jobtracker webui, map tasks/slots information would make more sense. Example Maps Tasks/Maps Slots Reduces Tasks/Reduces Slots Total Submissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Excluded Nodes 10/30 38/76 55 38 228 76 8.00 0 0

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Amar Kamat
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development