Description
On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes for times in the order of 10-15 minutes and once for one and a half hours. The stack trace shows that JobInProgress.obtainTaskCleanupTask() is waiting for lock on JobInProgress object which JobInProgress.initTasks() is holding for a long time waiting for DFS operations.
Attachments
Attachments
Issue Links
- incorporates
-
HADOOP-4375 Accesses to CompletedJobStore should not lock the JobTracker.
-
- Closed
-
- is related to
-
HADOOP-5483 Directory/file cleanup thread throws IllegalStateException
-
- Closed
-