Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.16.2
-
None
Description
After a job is submitted, HOD queries torque periodically until it finds the job to be running / completed (due to error). The initial rate of query is once every 0.5 seconds for 20 times, and then once every 10 seconds. This is probably a tad too aggressive as we find that Torque sometimes returns some odd errors under heavy load in the cluster (HADOOP-3216). It may be better to query at a more relaxed rate.