Hemanth and I looked at what's going on here. Essentially, there are two sources of truth, regarding the number of running tasks in the system. Each JobInProgress object maintains counts of running map and reduce tasks. These counts are incremented when a task is assigned to a TT (in obtainNewMapTask() or obtainNewReduceTask()). These counts are used the by the CapacityScheduler . The cluster summary, represented by the ClusterStatus object, also contains counts of the total number of maps and reduce tasks. These are incremented by the JT using the TT status. The counts maintained by the JobInProgress objects and the ClusterStatus object, are off by a heartbeat. The former increments its counts when a task is assigned. Once the task runs on a TT, its running status is conveyed to the JT in the TT's next heartbeat. During startup, a lot of TTs approach the JT for tasks to run. As a result, the counts of running tasks across all JobInProgress objects are much higher than the cluster count, since the cluster count is updated only when the TTs report their status in their next hearbeat. That explains the discrepancy reported in this Jira. In steady state, these two counts are mostly identical, or off by a little bit, as TTs finish their tasks at different times.
This is not really a bug, as it's not clear which count is 'correct'. We're reporting from two different sources: the cluster summary and the Scheduler (which gets it info from the JobInProgress objects). But different numbers do get reflected in the UI. So the best fix is to probably indicate in the Scheduler part of the UI that its computation is off from the cluster summary by a heartbeat. Maybe a little explanation in the bottom that says something like: "This info varies from that of the cluster summary by a heartbeat".
I don't think we should change anything in the scheduler or the cluster summary. They're both doing the right thing their own way. An alternate solution is to have the cluster summary use the counts from the JobInProgress objects, but this is performance-intensive, and was presumably the reason why the cluster summary maintains its own count.
You do want to the leave the rest of the UI as is. The cluster summary is useful, as is the per-queue information of running tasks (reported by the Scheduler) as it lets users know whether the queue is running above/at/below its guaranteed capacity.
Hence, the waiting counts should be removed from the scheduler information.
The scheduler maintains a partial waiting count of map/reduce tasks. It doesn't need to know the total number of pending tasks if this total is larger than the cluster capacity. So, for performance reasons, it only counts up to the cluster capacity.
HADOOP-4576 has been opened for this purpose and suggests that we display pending jobs instead of pending tasks, as the former seems more useful to users.