After a job is submitted, HOD queries torque periodically until it finds the job to be running / completed (due to error). The initial rate of query is once every 0.5 seconds for 20 times, and then once every 10 seconds. This is probably a tad too aggressive as we find that Torque sometimes returns some odd errors under heavy load in the cluster (
HADOOP-3216). It may be better to query at a more relaxed rate.