Issue Details (XML | Word | Printable)

Key: HADOOP-3864
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Arun C Murthy
Reporter: Arun C Murthy
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

JobTracker lockup due to JobInProgress.initTasks taking significant time for large jobs on large clusters

Created: 30/Jul/08 06:06 PM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: 0.18.0
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-3864_0_20080830.patch 2008-07-31 02:15 AM Arun C Murthy 4 kB

Hadoop Flags: Reviewed
Resolution Date: 08/Aug/08 11:42 PM


 Description  « Hide
JobInProgress.initTasks takes significant amount of time on a large cluster for large jobs (55k maps * 3 splits), during which the JobInProgress object is locked up.

Simultaneously the JobClient is calling JobTracker.getTaskCompletionEvents which locks the JobTracker & tries to lock the JobInProgress, there-by it starves all heartbeats which are trying to lock the JobTracker - resulting in a lockup.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Arun C Murthy made changes - 31/Jul/08 02:15 AM
Field Original Value New Value
Attachment HADOOP-3864_0_20080830.patch [ 12387251 ]
Arun C Murthy made changes - 31/Jul/08 02:15 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Owen O'Malley made changes - 08/Aug/08 11:42 PM
Resolution Fixed [ 1 ]
Hadoop Flags [Reviewed]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Nigel Daley made changes - 20/Nov/08 11:38 PM
Status Resolved [ 5 ] Closed [ 6 ]
Owen O'Malley made changes - 08/Jul/09 04:52 PM
Component/s mapred [ 12310690 ]