Gridmix maintains a list (L) of running jobs via JobMonitor. As soon as a job is submitted, a handle for that job is cached inside the JobMonitor. The JobMonitor does the following in a thread:
Gridmix STRESS mode logic uses the list L to compute the cluster load. It iterates over map/reduce progress of each and every job in L to figure out the pending+running task count. We need to investigate and optimize the JobMonitor algorithm and make sure that the total number of completed jobs in L is minimum. The overhead of polling for the map and reduce task progress of a completed job is pretty high as it incurs an additional (RPC) step of contacting the JobHistory server.