Hadoop Common
  1. Hadoop Common
  2. HADOOP-4445

Wrong number of running map/reduce tasks are displayed in queue information.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hadoop r705159, Queue=default, GC=100% MapCapacity=ReduceCapacity=212

    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changed JobTracker UI to better present the number of active tasks.

      Description

      Wrong number of running map/reduce tasks are displayed in queue information.

      1. runningjobs.png
        55 kB
        Sreekanth Ramakrishnan
      2. HADOOP-4445-4.patch
        15 kB
        Sreekanth Ramakrishnan
      3. HADOOP-4445-3.patch
        13 kB
        Sreekanth Ramakrishnan
      4. HADOOP-4445-2.patch
        12 kB
        Sreekanth Ramakrishnan
      5. HADOOP-4445-1.patch
        11 kB
        Sreekanth Ramakrishnan

        Issue Links

          Activity

          Hide
          Karam Singh added a comment -

          Submitted a job with map and reduce tasks = 212.
          When job starts getting slots, JobTracker UI displayed following information about "Cluster Summary" and "Scheduling Information" -:

          Maps=9, Reduces=0, Total Submissions=5, Nodes =106, Map Task Capacity=212 Reduce Task Capacity =212, Avg. Tasks/Node=4.00

          Scheduling Information shows -:
          Queue Name=default
          Guaranteed Capacity Maps (%) :100.0
          Guaranteed Capacity Reduces (%) :100.0
          Current Capacity Maps : 212
          Current Capacity Reduces : 212
          User Limit : 25
          Reclaim Time limit : 300
          Number of Running Maps : 105
          Number of Running Reduces : 8
          Number of Waiting Maps : 107
          Number of Waiting Reduces : 204
          Priority Supported : YES

          This type of information is displayed only when job is started getting slots.

          Show
          Karam Singh added a comment - Submitted a job with map and reduce tasks = 212. When job starts getting slots, JobTracker UI displayed following information about "Cluster Summary" and "Scheduling Information" -: Maps=9, Reduces=0, Total Submissions=5, Nodes =106, Map Task Capacity=212 Reduce Task Capacity =212, Avg. Tasks/Node=4.00 Scheduling Information shows -: Queue Name=default Guaranteed Capacity Maps (%) :100.0 Guaranteed Capacity Reduces (%) :100.0 Current Capacity Maps : 212 Current Capacity Reduces : 212 User Limit : 25 Reclaim Time limit : 300 Number of Running Maps : 105 Number of Running Reduces : 8 Number of Waiting Maps : 107 Number of Waiting Reduces : 204 Priority Supported : YES This type of information is displayed only when job is started getting slots.
          Hide
          Sreekanth Ramakrishnan added a comment -

          The information disparity comes in because, when a task is scheduled to be executed the Capacity Scheduler increases the number of the tasks running in a queue and then it passes the actual list of the tasks to the task tracker. So there is a disparity between what is actually running and what is scheduled to run on the task tracker.

          So I think we should be changing the Number of Running <task> to represent Running <task> Scheduled and <Task> To be scheduled.

          Would that make sense to the end user?

          Show
          Sreekanth Ramakrishnan added a comment - The information disparity comes in because, when a task is scheduled to be executed the Capacity Scheduler increases the number of the tasks running in a queue and then it passes the actual list of the tasks to the task tracker. So there is a disparity between what is actually running and what is scheduled to run on the task tracker. So I think we should be changing the Number of Running <task> to represent Running <task> Scheduled and <Task> To be scheduled . Would that make sense to the end user?
          Hide
          Hemanth Yamijala added a comment -

          Once the number of tasks is incremented in the scheduler, we have assigned it to the TaskTracker. So, when it comes back in the next heartbeat, this is going to be incremented on the JT. Is this understanding correct ?

          If yes, when the TTs come in for the next heartbeat, the difference in maps should not be as high as shown in the bug comment. Is this happening ?

          Show
          Hemanth Yamijala added a comment - Once the number of tasks is incremented in the scheduler, we have assigned it to the TaskTracker. So, when it comes back in the next heartbeat, this is going to be incremented on the JT. Is this understanding correct ? If yes, when the TTs come in for the next heartbeat, the difference in maps should not be as high as shown in the bug comment. Is this happening ?
          Hide
          Sreekanth Ramakrishnan added a comment -

          I had an offline discussion with Karam about this issue, as mentioned in his comment, this indeed happens for a short duration between heartbeats when the task slots are actually getting assigned for a particular job. And the cluster information and scheduling information get synched up in the next heartbeat.

          Show
          Sreekanth Ramakrishnan added a comment - I had an offline discussion with Karam about this issue, as mentioned in his comment, this indeed happens for a short duration between heartbeats when the task slots are actually getting assigned for a particular job. And the cluster information and scheduling information get synched up in the next heartbeat.
          Hide
          Hemanth Yamijala added a comment -

          In that case, I am thinking if it makes sense to not show this information at all in the scheduler's UI. Essentially it is a measure of how many tasks are scheduled, which mostly would become equal to number of tasks run in the next heartbeat. By removing this information, the user does not get to see how much is scheduled for a while. I think that is ok. So I would vote for removing this info from the scheduler UI. Others ?

          Show
          Hemanth Yamijala added a comment - In that case, I am thinking if it makes sense to not show this information at all in the scheduler's UI. Essentially it is a measure of how many tasks are scheduled, which mostly would become equal to number of tasks run in the next heartbeat. By removing this information, the user does not get to see how much is scheduled for a while. I think that is ok. So I would vote for removing this info from the scheduler UI. Others ?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          +1 for removing it from the scheduler information displayed, given that there is nothing else we can do. But I think this should be done only after fixing HADOOP-4444, i.e after we make sure that at steady state they are indeed equal.

          Show
          Vinod Kumar Vavilapalli added a comment - +1 for removing it from the scheduler information displayed, given that there is nothing else we can do. But I think this should be done only after fixing HADOOP-4444 , i.e after we make sure that at steady state they are indeed equal.
          Hide
          Sreekanth Ramakrishnan added a comment -

          I think, we should be remove the redundant information which is being displayed by the task scheduler as, we are already displaying the number of maps and reduces running cluster wide and job wise split in the job details table.

          Show
          Sreekanth Ramakrishnan added a comment - I think, we should be remove the redundant information which is being displayed by the task scheduler as, we are already displaying the number of maps and reduces running cluster wide and job wise split in the job details table.
          Hide
          Hemanth Yamijala added a comment -

          I am marking HADOOP-4444 as a blocker for this bug, as we would not be able to verify the fix for the patch, if this one gets fixed. I spoke to Vinod and found that's the same reason why he too wants HADOOP-4444 fixed first.

          Show
          Hemanth Yamijala added a comment - I am marking HADOOP-4444 as a blocker for this bug, as we would not be able to verify the fix for the patch, if this one gets fixed. I spoke to Vinod and found that's the same reason why he too wants HADOOP-4444 fixed first.
          Hide
          Sreekanth Ramakrishnan added a comment -

          I am marking HADOOP-4446 as blocker for this issue because, there is change in the scheduling information format and this issues also requires a change in the same area, so the patches would conflict, otherwise we should have a single patch for both issues put together.

          Show
          Sreekanth Ramakrishnan added a comment - I am marking HADOOP-4446 as blocker for this issue because, there is change in the scheduling information format and this issues also requires a change in the same area, so the patches would conflict, otherwise we should have a single patch for both issues put together.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The number of Waiting

          {Maps|Reduces}

          reduces is derived from the running counts which are not synchronized with the exact counts all the time. So, they won't be correct all the time, even though at the point of their actual usage in the scheduler we make sure that they are correct. Hence, the waiting counts should be removed from the scheduler information.

          The correct waiting counts can be maintained by JT(just like running counts) and stored in the cluster-summary; will file a JIRA to get that information there.

          Show
          Vinod Kumar Vavilapalli added a comment - The number of Waiting {Maps|Reduces} reduces is derived from the running counts which are not synchronized with the exact counts all the time. So, they won't be correct all the time, even though at the point of their actual usage in the scheduler we make sure that they are correct. Hence, the waiting counts should be removed from the scheduler information. The correct waiting counts can be maintained by JT(just like running counts) and stored in the cluster-summary; will file a JIRA to get that information there.
          Hide
          Vivek Ratan added a comment -

          Hemanth and I looked at what's going on here. Essentially, there are two sources of truth, regarding the number of running tasks in the system. Each JobInProgress object maintains counts of running map and reduce tasks. These counts are incremented when a task is assigned to a TT (in obtainNewMapTask() or obtainNewReduceTask()). These counts are used the by the CapacityScheduler . The cluster summary, represented by the ClusterStatus object, also contains counts of the total number of maps and reduce tasks. These are incremented by the JT using the TT status. The counts maintained by the JobInProgress objects and the ClusterStatus object, are off by a heartbeat. The former increments its counts when a task is assigned. Once the task runs on a TT, its running status is conveyed to the JT in the TT's next heartbeat. During startup, a lot of TTs approach the JT for tasks to run. As a result, the counts of running tasks across all JobInProgress objects are much higher than the cluster count, since the cluster count is updated only when the TTs report their status in their next hearbeat. That explains the discrepancy reported in this Jira. In steady state, these two counts are mostly identical, or off by a little bit, as TTs finish their tasks at different times.

          This is not really a bug, as it's not clear which count is 'correct'. We're reporting from two different sources: the cluster summary and the Scheduler (which gets it info from the JobInProgress objects). But different numbers do get reflected in the UI. So the best fix is to probably indicate in the Scheduler part of the UI that its computation is off from the cluster summary by a heartbeat. Maybe a little explanation in the bottom that says something like: "This info varies from that of the cluster summary by a heartbeat".

          I don't think we should change anything in the scheduler or the cluster summary. They're both doing the right thing their own way. An alternate solution is to have the cluster summary use the counts from the JobInProgress objects, but this is performance-intensive, and was presumably the reason why the cluster summary maintains its own count.

          You do want to the leave the rest of the UI as is. The cluster summary is useful, as is the per-queue information of running tasks (reported by the Scheduler) as it lets users know whether the queue is running above/at/below its guaranteed capacity.

          Hence, the waiting counts should be removed from the scheduler information.

          The scheduler maintains a partial waiting count of map/reduce tasks. It doesn't need to know the total number of pending tasks if this total is larger than the cluster capacity. So, for performance reasons, it only counts up to the cluster capacity. HADOOP-4576 has been opened for this purpose and suggests that we display pending jobs instead of pending tasks, as the former seems more useful to users.

          Show
          Vivek Ratan added a comment - Hemanth and I looked at what's going on here. Essentially, there are two sources of truth, regarding the number of running tasks in the system. Each JobInProgress object maintains counts of running map and reduce tasks. These counts are incremented when a task is assigned to a TT (in obtainNewMapTask() or obtainNewReduceTask()). These counts are used the by the CapacityScheduler . The cluster summary, represented by the ClusterStatus object, also contains counts of the total number of maps and reduce tasks. These are incremented by the JT using the TT status. The counts maintained by the JobInProgress objects and the ClusterStatus object, are off by a heartbeat. The former increments its counts when a task is assigned. Once the task runs on a TT, its running status is conveyed to the JT in the TT's next heartbeat. During startup, a lot of TTs approach the JT for tasks to run. As a result, the counts of running tasks across all JobInProgress objects are much higher than the cluster count, since the cluster count is updated only when the TTs report their status in their next hearbeat. That explains the discrepancy reported in this Jira. In steady state, these two counts are mostly identical, or off by a little bit, as TTs finish their tasks at different times. This is not really a bug, as it's not clear which count is 'correct'. We're reporting from two different sources: the cluster summary and the Scheduler (which gets it info from the JobInProgress objects). But different numbers do get reflected in the UI. So the best fix is to probably indicate in the Scheduler part of the UI that its computation is off from the cluster summary by a heartbeat. Maybe a little explanation in the bottom that says something like: "This info varies from that of the cluster summary by a heartbeat". I don't think we should change anything in the scheduler or the cluster summary. They're both doing the right thing their own way. An alternate solution is to have the cluster summary use the counts from the JobInProgress objects, but this is performance-intensive, and was presumably the reason why the cluster summary maintains its own count. You do want to the leave the rest of the UI as is. The cluster summary is useful, as is the per-queue information of running tasks (reported by the Scheduler) as it lets users know whether the queue is running above/at/below its guaranteed capacity. Hence, the waiting counts should be removed from the scheduler information. The scheduler maintains a partial waiting count of map/reduce tasks. It doesn't need to know the total number of pending tasks if this total is larger than the cluster capacity. So, for performance reasons, it only counts up to the cluster capacity. HADOOP-4576 has been opened for this purpose and suggests that we display pending jobs instead of pending tasks, as the former seems more useful to users.
          Hide
          Amar Kamat added a comment -

          Vivek's comment makes sense. I guess the confusion is because of the lack of information provided along with the counts. If we fix on the naming and say

          running task : The task that is currenly getting executed on some tracker
          scheduled task : The task that is scheduled by the scheduler/jt for running and which might not be actually running
          

          then we can show running-tasks info in cluster-summary and scheduled-tasks info in the scheduler's info and get away with the issue. Thoughts?

          Show
          Amar Kamat added a comment - Vivek's comment makes sense. I guess the confusion is because of the lack of information provided along with the counts. If we fix on the naming and say running task : The task that is currenly getting executed on some tracker scheduled task : The task that is scheduled by the scheduler/jt for running and which might not be actually running then we can show running-tasks info in cluster-summary and scheduled-tasks info in the scheduler's info and get away with the issue. Thoughts?
          Hide
          Vivek Ratan added a comment -

          then we can show running-tasks info in cluster-summary and scheduled-tasks info in the scheduler's info and get away with the issue.

          I personally don't think you want to introduce such fine-grained differences. Having the user think of 'running' tasks and 'scheduled' tasks separately seems unnecessary as it doesn't buy them much. If the numbers are off by a bit, that doesn't seem so bad, as long as you can maybe add a little disclaimer, or provide an explanation in a FAQ. My guess is, users will really be concerned about the stats for their queue. As long as they can see whether a queue is fully occupied or not, and where they are in the waiting list, they should be fine.

          Show
          Vivek Ratan added a comment - then we can show running-tasks info in cluster-summary and scheduled-tasks info in the scheduler's info and get away with the issue. I personally don't think you want to introduce such fine-grained differences. Having the user think of 'running' tasks and 'scheduled' tasks separately seems unnecessary as it doesn't buy them much. If the numbers are off by a bit, that doesn't seem so bad, as long as you can maybe add a little disclaimer, or provide an explanation in a FAQ. My guess is, users will really be concerned about the stats for their queue. As long as they can see whether a queue is fully occupied or not, and where they are in the waiting list, they should be fine.
          Hide
          Amar Kamat added a comment -

          If the numbers are off by a bit

          At the max the numbers can be off by # nodes in the cluster.

          Show
          Amar Kamat added a comment - If the numbers are off by a bit At the max the numbers can be off by # nodes in the cluster.
          Hide
          Hemanth Yamijala added a comment -

          There are a few more issues with this information display. I had an offline discussion with Vivek, and we came up with a few observations and ideas.

          • The information is accessed by the UI update thread and the updateQSI method without proper synchronization.This should be addressed. Currently, the QueueSchedulingInfo object is a simple data object, and the information in a given instance of this should be accessed together. Currently, the access to this object is done synchronized via the TaskSchedulingMgr object. Maybe then, instead of accessing the QSI fields directly, it should access it via the TaskSchedulingMgr.
          • The capacity scheduler also updates only the reduce scheduler or the map scheduler in a given heartbeat. So in a scenario where reduce tasks are finishing along with map tasks, since we update the reduce scheduler in preference to the map scheduler, the information for the map tasks could be off by more than a heartbeat. However, in a steady state, this may not be that big an issue.

          There are some options to address this:

          • We could make it explicit that the information is not synch'ed with the cluster summary (as mentioned by Vivek above, though the information should probably not be treated as off by only a heartbeat)
          • We could ensure that the information of either the map and reduce scheduler is updated at least once every so often. For e.g. we could update it once every 3 heartbeats or so.
          • We could also have an updater thread that runs periodically and updates the numbers every time it runs. We could use the same code for updates as the updateQSI method itself, thought it could maintain a separate copy of the data, so as to not introduce synchronization constraints on the scheduler.

          The advantage with the last two approaches is that we could deterministically say how far off the scheduling info would be, as compared to the cluster status. For e.g. if the updater thread runs once every 30 seconds, we could say the information would be off by 30 seconds.

          Since in any case it appears that the information cannot be completely in sync, maybe we should leave it simple for now, mark that the information is not synchronized with the cluster status, and see if in steady state the information is way off. If that happens, we could fix it using one of the methods I've stated above. Thoughts ?

          Show
          Hemanth Yamijala added a comment - There are a few more issues with this information display. I had an offline discussion with Vivek, and we came up with a few observations and ideas. The information is accessed by the UI update thread and the updateQSI method without proper synchronization.This should be addressed. Currently, the QueueSchedulingInfo object is a simple data object, and the information in a given instance of this should be accessed together. Currently, the access to this object is done synchronized via the TaskSchedulingMgr object. Maybe then, instead of accessing the QSI fields directly, it should access it via the TaskSchedulingMgr. The capacity scheduler also updates only the reduce scheduler or the map scheduler in a given heartbeat. So in a scenario where reduce tasks are finishing along with map tasks, since we update the reduce scheduler in preference to the map scheduler, the information for the map tasks could be off by more than a heartbeat. However, in a steady state, this may not be that big an issue. There are some options to address this: We could make it explicit that the information is not synch'ed with the cluster summary (as mentioned by Vivek above, though the information should probably not be treated as off by only a heartbeat) We could ensure that the information of either the map and reduce scheduler is updated at least once every so often. For e.g. we could update it once every 3 heartbeats or so. We could also have an updater thread that runs periodically and updates the numbers every time it runs. We could use the same code for updates as the updateQSI method itself, thought it could maintain a separate copy of the data, so as to not introduce synchronization constraints on the scheduler. The advantage with the last two approaches is that we could deterministically say how far off the scheduling info would be, as compared to the cluster status. For e.g. if the updater thread runs once every 30 seconds, we could say the information would be off by 30 seconds. Since in any case it appears that the information cannot be completely in sync, maybe we should leave it simple for now, mark that the information is not synchronized with the cluster status, and see if in steady state the information is way off. If that happens, we could fix it using one of the methods I've stated above. Thoughts ?
          Hide
          Sreekanth Ramakrishnan added a comment -

          Instead of relying on freshness of the queue information objects which are used by scheduler to maintain the scheduling information, we can compute the number of running tasks when the information is requested for by the user.

          Following can be done for the same:

          • Call JobQueueManager.getRunningJobs(queue).
          • Iterate thro' list of jobs and keep adding to global running tasks count.

          We also should be fixing HADOOP-4576 along with this so that we don't iterate thro' all the waiting jobs in order to compute the pending total tasks in a queue.

          Only downside of this approach would be that, whenever there is a UI request we would be iterating thro' running jobs in order to compute the number of running task, which might be a performance problem.

          Any comments on the approach described above?

          Show
          Sreekanth Ramakrishnan added a comment - Instead of relying on freshness of the queue information objects which are used by scheduler to maintain the scheduling information, we can compute the number of running tasks when the information is requested for by the user. Following can be done for the same: Call JobQueueManager.getRunningJobs(queue). Iterate thro' list of jobs and keep adding to global running tasks count. We also should be fixing HADOOP-4576 along with this so that we don't iterate thro' all the waiting jobs in order to compute the pending total tasks in a queue. Only downside of this approach would be that, whenever there is a UI request we would be iterating thro' running jobs in order to compute the number of running task, which might be a performance problem. Any comments on the approach described above?
          Hide
          Hemanth Yamijala added a comment -

          We can probably check what happens with a reasonable size of running jobs per queue - say 50 jobs. If the performance isn't a concern, this may be a reasonable approach.

          Show
          Hemanth Yamijala added a comment - We can probably check what happens with a reasonable size of running jobs per queue - say 50 jobs. If the performance isn't a concern, this may be a reasonable approach.
          Hide
          Hemanth Yamijala added a comment -

          We should also check that if this info is periodically checked, say once every 30 seconds or so (equal to a refresh interval on the web page), the performance is still fine.

          In case this is turning out to be an issue, caching the values maybe an option. We can update it periodically like I suggested above.

          Show
          Hemanth Yamijala added a comment - We should also check that if this info is periodically checked, say once every 30 seconds or so (equal to a refresh interval on the web page), the performance is still fine. In case this is turning out to be an issue, caching the values maybe an option. We can update it periodically like I suggested above.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Checked with 80 jobs in a queue, the maximum time taken for getting running and pending tasks was 2 milliseconds(inclusive of scheduler.getJobs()) by refreshing the job tracker and queue details page every 15 seconds.

          So if we have 10 queues, we would have a maximum delay of 20 milliseconds when user requests for information. This can be further reduced if we get in HADOOP-4576 as then we would count tasks in running job queue and then subtract the running job number from total job number to get waiting job count.

          Show
          Sreekanth Ramakrishnan added a comment - Checked with 80 jobs in a queue, the maximum time taken for getting running and pending tasks was 2 milliseconds(inclusive of scheduler.getJobs()) by refreshing the job tracker and queue details page every 15 seconds. So if we have 10 queues, we would have a maximum delay of 20 milliseconds when user requests for information. This can be further reduced if we get in HADOOP-4576 as then we would count tasks in running job queue and then subtract the running job number from total job number to get waiting job count.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Marking HADOOP-4576 as a blocker for this issue as, we are optimizing the calculation of number of waiting job count in HADOOP-4576 and then fixing running job count in this JIRA.

          Show
          Sreekanth Ramakrishnan added a comment - Marking HADOOP-4576 as a blocker for this issue as, we are optimizing the calculation of number of waiting job count in HADOOP-4576 and then fixing running job count in this JIRA.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching a patch fixing this issue:

          The patch does the following:

          1. in SchedulingInfo.toString() it does JobQueueManager.getRunningJobs(queue) and iterates thro' running job list and keeps adding number of running tasks in the queue. We need not worry about the jobs which are initialized but not scheduled as those jobs would have running maps and reduces as zero.
          2. Modified testSchedulingInfo() by assigning a map and reduce task to the tracker and check the number of running tasks. Then finished the job and checked the same. Then check for case where running job fails.

          I have moved the comment from within testSchedulingInfo() to outside. I have removed the FakeJobInProgress.fail() and I am using taskTrackerManager.finalizeJob api instead as the latter raises events which is required for cleaning up the running job queues.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching a patch fixing this issue: The patch does the following: 1. in SchedulingInfo.toString() it does JobQueueManager.getRunningJobs(queue) and iterates thro' running job list and keeps adding number of running tasks in the queue. We need not worry about the jobs which are initialized but not scheduled as those jobs would have running maps and reduces as zero. 2. Modified testSchedulingInfo() by assigning a map and reduce task to the tracker and check the number of running tasks. Then finished the job and checked the same. Then check for case where running job fails. I have moved the comment from within testSchedulingInfo() to outside. I have removed the FakeJobInProgress.fail() and I am using taskTrackerManager.finalizeJob api instead as the latter raises events which is required for cleaning up the running job queues.
          Hide
          Amar Kamat added a comment -

          Few comments :
          TestCapacityScheduler
          1)

          -    String[] infoStrings = schedulingInfo.split("\n");
          -    
          +    String[] infoStrings = schedulingInfo.split("\n");  
          

          Unnecessary diff.

          2) I feel we should break testSchedulingInformation() into 3 sub testcases namely

          • testQueueSchedulingInformation()
          • testRunningSchedulingInformation()
          • testWaitingSchedulingInformation()
            etc .... or something like that. The reason being any small change might require understanding the complete test case and also making/adjusting changes to support the complete testcase as a whole.
          Show
          Amar Kamat added a comment - Few comments : TestCapacityScheduler 1) - String [] infoStrings = schedulingInfo.split( "\n" ); - + String [] infoStrings = schedulingInfo.split( "\n" ); Unnecessary diff. 2) I feel we should break testSchedulingInformation() into 3 sub testcases namely testQueueSchedulingInformation() testRunningSchedulingInformation() testWaitingSchedulingInformation() etc .... or something like that. The reason being any small change might require understanding the complete test case and also making/adjusting changes to support the complete testcase as a whole.
          Hide
          Amar Kamat added a comment -

          Had an offline discussion with Sreekanth and it seems that we can hold the test case splitting for now. Also we should add a disclaimer that cluster-status and scheduler-info will be off. I personally dont think its worth mentioning the number by which it will be off.

          Show
          Amar Kamat added a comment - Had an offline discussion with Sreekanth and it seems that we can hold the test case splitting for now. Also we should add a disclaimer that cluster-status and scheduler-info will be off. I personally dont think its worth mentioning the number by which it will be off.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching screen shot so that can show what would be displayed to the user.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching screen shot so that can show what would be displayed to the user.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Incorporating Amar's comment in the patch.

          I am not splitting test case in this patch as, I am testing default scheduling information plus running and waiting count side by side. Even if we are splitting the test case then logical division would be running,waiting and default scheduling information. Because running and waiting information can be tested side by side.

          The disclaimer which I was testing has been removed as I have changed the disclaimer. Can you please check the same?

          Show
          Sreekanth Ramakrishnan added a comment - Incorporating Amar's comment in the patch. I am not splitting test case in this patch as, I am testing default scheduling information plus running and waiting count side by side. Even if we are splitting the test case then logical division would be running,waiting and default scheduling information. Because running and waiting information can be tested side by side. The disclaimer which I was testing has been removed as I have changed the disclaimer. Can you please check the same?
          Hide
          Sreekanth Ramakrishnan added a comment -

          After offline discussion with Hemanth and Amar, it was decided that we would show used maps and reduce in terms of percentage instead of running maps and reduce. Attaching screenshot with changes made to display percentages.

          Show
          Sreekanth Ramakrishnan added a comment - After offline discussion with Hemanth and Amar, it was decided that we would show used maps and reduce in terms of percentage instead of running maps and reduce. Attaching screenshot with changes made to display percentages.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Changes made to display maps and reduces as Used Maps (%) and Used Reduces (%), made appropriate changes in Test case

          Show
          Sreekanth Ramakrishnan added a comment - Changes made to display maps and reduces as Used Maps (%) and Used Reduces (%), made appropriate changes in Test case
          Hide
          Sreekanth Ramakrishnan added a comment -

          Had an offline discussion with Amar, I have incorporated changes which were suggested by him. The screenshot attached displays the UI which would be presented to the user.

          Show
          Sreekanth Ramakrishnan added a comment - Had an offline discussion with Amar, I have incorporated changes which were suggested by him. The screenshot attached displays the UI which would be presented to the user.
          Hide
          Amar Kamat added a comment -

          I feel Used Maps doesnt convey the sense that the maps are scheduled. Scheduled Maps or Running Maps might make more sense. Also there is no clear cut distinction as to what information is config and what is dynamic. I feel we should have 2 sections namely Config/Constants and Runtime Status. Disclaimers might actually confuse the users.

          Show
          Amar Kamat added a comment - I feel Used Maps doesnt convey the sense that the maps are scheduled. Scheduled Maps or Running Maps might make more sense. Also there is no clear cut distinction as to what information is config and what is dynamic. I feel we should have 2 sections namely Config/Constants and Runtime Status . Disclaimers might actually confuse the users.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching latest screenshot incorporating all changes

          Show
          Sreekanth Ramakrishnan added a comment - Attaching latest screenshot incorporating all changes
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch incorporating comments from above

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch incorporating comments from above
          Hide
          Amar Kamat added a comment -

          +1. Looks good.

          Show
          Amar Kamat added a comment - +1. Looks good.
          Hide
          Hemanth Yamijala added a comment -

          Output of test-patch:

          [exec] +1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] +1 tests included. The patch appears to include 4 new or modified tests.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          Show
          Hemanth Yamijala added a comment - Output of test-patch: [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Hide
          Hemanth Yamijala added a comment -

          Ant test-contrib passed without errors. The patch only touches capacity scheduler and nothing in core. So, I did not run core tests.

          Show
          Hemanth Yamijala added a comment - Ant test-contrib passed without errors. The patch only touches capacity scheduler and nothing in core. So, I did not run core tests.
          Hide
          Hemanth Yamijala added a comment -

          I just committed this. Thanks, Sreekanth !

          Show
          Hemanth Yamijala added a comment - I just committed this. Thanks, Sreekanth !
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #680 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/680/ )
          Hide
          Robert Chansler added a comment -

          Edit release note for publication.

          Show
          Robert Chansler added a comment - Edit release note for publication.

            People

            • Assignee:
              Sreekanth Ramakrishnan
              Reporter:
              Karam Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development