Hadoop Common
  1. Hadoop Common
  2. HADOOP-4576

Modify pending tasks count in the UI to pending jobs count in the UI

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changed capacity scheduler UI to better present number of running and pending tasks.

      Description

      The UI for capacity scheduler displays 'pending tasks' counts. However the capacity scheduler does not update these counts to be the actual values for optimization purposes, for e.g. to avoid walking all pending jobs on all heartbeats. Hence this information is not very accurate.

      Also, while 'running tasks' counts are useful to compare against capacities and limits, 'pending tasks' counts do not add too much user value. A better count to display would be the number of running and pending jobs.

      1. HADOOP-4576-1.patch
        6 kB
        Sreekanth Ramakrishnan
      2. HADOOP-4576-2.patch
        10 kB
        Sreekanth Ramakrishnan
      3. HADOOP-4576-3.patch
        9 kB
        Sreekanth Ramakrishnan
      4. HADOOP-4576-4.patch
        11 kB
        Sreekanth Ramakrishnan
      5. HADOOP-4576-5.patch
        11 kB
        Sreekanth Ramakrishnan
      6. HADOOP-4576-6.patch
        10 kB
        Sreekanth Ramakrishnan
      7. waitingjobs1.png
        62 kB
        Sreekanth Ramakrishnan

        Issue Links

          Activity

          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch, which displays running jobs and waiting jobs instead of running and waiting task counts for maps and reduces.

          Waiting jobs and running jobs are computed after getting all jobs which are present in the scheduler and checking their status, this is done to incorporate changes which would be made in HADOOP-4513

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch, which displays running jobs and waiting jobs instead of running and waiting task counts for maps and reduces. Waiting jobs and running jobs are computed after getting all jobs which are present in the scheduler and checking their status, this is done to incorporate changes which would be made in HADOOP-4513
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching output of ant test-patch:

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
          
          Show
          Sreekanth Ramakrishnan added a comment - Attaching output of ant test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching new patch, this patch does not iterate thro' job list to find count of waiting jobs,instead introduces a counter in the QueueInfo object in JobQueueManager to maintain a count of waiting jobs which is incremented and decremented (by JobInitializationPoller). This would mean that in HADOOP-4445 we just need to iterate thro' running jobs queue to find number of running tasks for that queue.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching new patch, this patch does not iterate thro' job list to find count of waiting jobs,instead introduces a counter in the QueueInfo object in JobQueueManager to maintain a count of waiting jobs which is incremented and decremented (by JobInitializationPoller). This would mean that in HADOOP-4445 we just need to iterate thro' running jobs queue to find number of running tasks for that queue.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching output of ant test-patch :

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          
          
          Show
          Sreekanth Ramakrishnan added a comment - Attaching output of ant test-patch : [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching a new patch merged with current trunk.

          Plus have removed the variable which I introduced in previous patch, it was not required as we can find out number of waiting jobs directly from size of the jobList instead of having a counter.

          Also added a note in scheduling information stating that scheduling information can be off at the maximum by polling interval of the job initialization poller which does clean up of the waiting jobs.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching a new patch merged with current trunk. Plus have removed the variable which I introduced in previous patch, it was not required as we can find out number of waiting jobs directly from size of the jobList instead of having a counter. Also added a note in scheduling information stating that scheduling information can be off at the maximum by polling interval of the job initialization poller which does clean up of the waiting jobs.
          Hide
          Amar Kamat added a comment -

          Comments :
          JobQueuesManager.java :
          1) No need to import AtomicInteger. Also plz remove extra diffs.

          CapacityTaskScheduler.java :
          1) You cant simply change the constructor def. Overload it and deprecate the other if needed.

          2) Extra diffs w.r.t.

          -         supportsPriority?"YES":"NO"));      
          +         supportsPriority?"YES":"NO"));
          

          3)

          + sb.append(String.format("* Scheduling information can be off by " +
          + "maximum of %d seconds\n", pollingInterval/1000));


          Use StringUtils for formatting time.

          TestCapacityScheduler.java
          1)

          + scheduler.assignTasks(tracker("tt1")); // heartbeat
          + p.selectJobsToInitialize();


          The ordering doesnt seem right. You submit 5 jobs, try to assign tasks (which should be no-op) and then you init the jobs.

          2) Shouldnt we also check/test the timing issue that after poll-interval units of time the values are correct. Something like

          • add jobs
          • check the queue-sched-info
          • allow the jobs to be inited by the poller i.e wait for poller-interval time
          • check again to see if the change is made.

          2) Plz mark the start and end of a new sub-test using comments

          Show
          Amar Kamat added a comment - Comments : JobQueuesManager.java : 1) No need to import AtomicInteger . Also plz remove extra diffs. CapacityTaskScheduler.java : 1) You cant simply change the constructor def. Overload it and deprecate the other if needed. 2) Extra diffs w.r.t. - supportsPriority? "YES" : "NO" )); + supportsPriority? "YES" : "NO" )); 3) + sb.append(String.format("* Scheduling information can be off by " + + "maximum of %d seconds\n", pollingInterval/1000)); Use StringUtils for formatting time. TestCapacityScheduler.java 1) + scheduler.assignTasks(tracker("tt1")); // heartbeat + p.selectJobsToInitialize(); The ordering doesnt seem right. You submit 5 jobs, try to assign tasks (which should be no-op) and then you init the jobs. 2) Shouldnt we also check/test the timing issue that after poll-interval units of time the values are correct. Something like add jobs check the queue-sched-info allow the jobs to be inited by the poller i.e wait for poller-interval time check again to see if the change is made. 2) Plz mark the start and end of a new sub-test using comments
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching latest patch incorporating Amar's comments.

          With respect to the test cases:

          Following is the order we are testing the scheduling information:

          • Submit 5 jobs to a queue.
          • Check the waiting jobs count, it should be 5.
          • Then run initializationPoller(), this initializes first two jobs in the queue but does not raise the job status changed event.
          • Check once again the waiting queue, it should be 5 jobs again.
          • Then raise status change events.
          • Assign one task to a task tracker.
          • Check waiting job count, it should be 4 now.
          • Then pick an initialized job but not scheduled job and fail it.
          • Run the poller, since poller is responsible for removing, failed jobs and scheduled jobs from job queue maintained by job queue manager, there is no requirement for raising status changed event as the job is cleared without it by the poller once job fails before it gets scheduled. With respect to removal of the jobs which are scheduled and failing is taken care by the JobQueueManager.
          • Check the waiting job count should now be 3.
          • Now fail a job which has not been initialized at all.
          • Run the poller, so that it can clean up the job queue.
          • Check the count, the waiting job count should be 2.

          I hope this answers the doubt with regarding test case, I have also mentioned the steps as inline comment in the test case.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching latest patch incorporating Amar's comments. With respect to the test cases: Following is the order we are testing the scheduling information: Submit 5 jobs to a queue. Check the waiting jobs count, it should be 5. Then run initializationPoller(), this initializes first two jobs in the queue but does not raise the job status changed event. Check once again the waiting queue, it should be 5 jobs again. Then raise status change events. Assign one task to a task tracker. Check waiting job count, it should be 4 now. Then pick an initialized job but not scheduled job and fail it. Run the poller, since poller is responsible for removing, failed jobs and scheduled jobs from job queue maintained by job queue manager, there is no requirement for raising status changed event as the job is cleared without it by the poller once job fails before it gets scheduled. With respect to removal of the jobs which are scheduled and failing is taken care by the JobQueueManager. Check the waiting job count should now be 3. Now fail a job which has not been initialized at all. Run the poller, so that it can clean up the job queue. Check the count, the waiting job count should be 2. I hope this answers the doubt with regarding test case, I have also mentioned the steps as inline comment in the test case.
          Hide
          Amar Kamat added a comment -

          +1. Looks good.

          Show
          Amar Kamat added a comment - +1. Looks good.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch with merging with today's trunk.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch with merging with today's trunk.
          Hide
          Hemanth Yamijala added a comment -

          I think it is not necessary to deprecate the constructor of SchedulingInfo. It is a private static inner class and hence nobody should be affected ? It would just add more code to maintain. Can you please submit a new patch removing the deprecation and just changing the api ? Please also remember to remove the checks for null in the toString method, as the object is not expected to be null after this change.

          BTW, I spoke to Amar about this and we agree on this point now.

          Show
          Hemanth Yamijala added a comment - I think it is not necessary to deprecate the constructor of SchedulingInfo. It is a private static inner class and hence nobody should be affected ? It would just add more code to maintain. Can you please submit a new patch removing the deprecation and just changing the api ? Please also remember to remove the checks for null in the toString method, as the object is not expected to be null after this change. BTW, I spoke to Amar about this and we agree on this point now.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Changing visibility of the constructor and removing the deprecation.

          ant test-patch output for the patch is :

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          
          Show
          Sreekanth Ramakrishnan added a comment - Changing visibility of the constructor and removing the deprecation. ant test-patch output for the patch is : [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching screenshot of the scheduling information after the patch has been applied.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching screenshot of the scheduling information after the patch has been applied.
          Hide
          Sreekanth Ramakrishnan added a comment -

          The ant test-contrib passed on the local machine.

          Show
          Sreekanth Ramakrishnan added a comment - The ant test-contrib passed on the local machine.
          Hide
          Hemanth Yamijala added a comment -

          I just committed this. Thanks, Sreekanth !

          Show
          Hemanth Yamijala added a comment - I just committed this. Thanks, Sreekanth !
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #680 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/680/ )
          Hide
          Robert Chansler added a comment -

          Edit release note for publication.

          Show
          Robert Chansler added a comment - Edit release note for publication.

            People

            • Assignee:
              Sreekanth Ramakrishnan
              Reporter:
              Hemanth Yamijala
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development