Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.18.0
-
None
-
None
-
Reviewed
Description
JobInProgress.initTasks takes significant amount of time on a large cluster for large jobs (55k maps * 3 splits), during which the JobInProgress object is locked up.
Simultaneously the JobClient is calling JobTracker.getTaskCompletionEvents which locks the JobTracker & tries to lock the JobInProgress, there-by it starves all heartbeats which are trying to lock the JobTracker - resulting in a lockup.