Issue Details (XML | Word | Printable)

Key: HADOOP-3864
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Arun C Murthy
Reporter: Arun C Murthy
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

JobTracker lockup due to JobInProgress.initTasks taking significant time for large jobs on large clusters

Created: 30/Jul/08 06:06 PM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: 0.18.0
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-3864_0_20080830.patch 2008-07-31 02:15 AM Arun C Murthy 4 kB

Hadoop Flags: Reviewed
Resolution Date: 08/Aug/08 11:42 PM


 Description  « Hide
JobInProgress.initTasks takes significant amount of time on a large cluster for large jobs (55k maps * 3 splits), during which the JobInProgress object is locked up.

Simultaneously the JobClient is calling JobTracker.getTaskCompletionEvents which locks the JobTracker & tries to lock the JobInProgress, there-by it starves all heartbeats which are trying to lock the JobTracker - resulting in a lockup.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Arun C Murthy added a comment - 31/Jul/08 02:15 AM
Here is the simplest patch to fix this, rather than the correct path which is to go fix the whole synchronization mess in the JobTracker (see HADOOP-869). For now, I just turn away the JobClient when the job is being initalized, thereby avoiding the problem...

Hadoop QA added a comment - 31/Jul/08 06:26 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12387251/HADOOP-3864_0_20080830.patch
against trunk revision 681243.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2991/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2991/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2991/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2991/console

This message is automatically generated.


Owen O'Malley added a comment - 08/Aug/08 11:42 PM
I just committed this. Thanks, Arun!

Hudson added a comment - 22/Aug/08 12:34 PM