This is not my patch but was pointed out internally by a dev but nobody followed up. Uploading here to see if this makes sense.
Copy&Pasting his comment.
I tried to reproduce this on a small cluster (60 nodes) with hadoop 0.20.202.
Steps to reproduce the issue:
1. Setup file sink for jobtracker, so that we can get waiting_maps counter in a
2. Have queue configs similar to that of production (i took **** config)
3. Submit a simple sort job with -Dmapred.job.queue.name="search_general". This
queue should not be present in the cluster. Now, the waiting_maps would get
into -ve value. Example is given below.
1298346748441 mapred.jobtracker: context=mapred, sessionId=,
hostName=, waiting_maps=-120, waiting_reduces=-16,
1. WaitingMaps are incremented in JobInProgress.initTasks(). If a user gets an
exception even before tasks are initialized, JobInProgress decrements the
waiting_maps wrongly in garbageCollect(). This causes -ve values in
waiting_maps and waiting_reduces.
Tried the following code change in JobInProgress for fixing:
//check if tasks are initialized, and decrement waiting_maps accordingly.
// Let the JobTracker know that a job is complete
Need to check with dev for reviewing the above logic.