Hadoop Common
  1. Hadoop Common
  2. HADOOP-5392

JobTracker crashes during recovery if job files are garbled

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Jobtracker crashed in the recovery stage for a job with 0 byte job.xml. Ideally one would expect the jobtracker to try and recover as many jobs as possible.

      1. HADOOP-5392-v2.7.patch
        12 kB
        Amar Kamat
      2. HADOOP-5392-v2.6.patch
        12 kB
        Amar Kamat
      3. HADOOP-5392-v2.3.patch
        10 kB
        Amar Kamat
      4. HADOOP-5392-v2.1.patch
        10 kB
        Amar Kamat

        Activity

        Hide
        Amar Kamat added a comment -

        HADOOP-4638 added this piece of code

        synchronized (trackerToJobsToCleanup) {
                  Set<JobID> jobs = trackerToJobsToCleanup.get(trackerName);
                  jobs.add(taskId.getJobID());
                }
        

        Here jobs can be null. A null check should be made. This issue was detected during HADOOP-5392 testing

        Show
        Amar Kamat added a comment - HADOOP-4638 added this piece of code synchronized (trackerToJobsToCleanup) { Set<JobID> jobs = trackerToJobsToCleanup.get(trackerName); jobs.add(taskId.getJobID()); } Here jobs can be null. A null check should be made. This issue was detected during HADOOP-5392 testing
        Hide
        Amar Kamat added a comment -

        Attaching a patch the fixes the issue. Added a test case to validate that. Result of test-patch

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Amar Kamat added a comment - Attaching a patch the fixes the issue. Added a test case to validate that. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Amareshwari Sriramadasu added a comment -

        One comment:
        You can move everything in first while loop to single try-catch block.

        Otherwise, patch looks fine.

        Show
        Amareshwari Sriramadasu added a comment - One comment: You can move everything in first while loop to single try-catch block. Otherwise, patch looks fine.
        Hide
        Amar Kamat added a comment -

        Attaching a patch incorporating Amareshwari's comments. Test-patch result

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Amar Kamat added a comment - Attaching a patch incorporating Amareshwari's comments. Test-patch result [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Amar Kamat added a comment -

        Submitting. Patch applies to both trunk and 0.20.

        Show
        Amar Kamat added a comment - Submitting. Patch applies to both trunk and 0.20.
        Hide
        Amareshwari Sriramadasu added a comment -

        ant test passed on local machine.

        Show
        Amareshwari Sriramadasu added a comment - ant test passed on local machine.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12401600/HADOOP-5392-v2.3.patch
        against trunk revision 751463.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12401600/HADOOP-5392-v2.3.patch against trunk revision 751463. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/34/console This message is automatically generated.
        Hide
        Amar Kamat added a comment -

        Attaching a patch incorporating Devaraj's offline comments. Result of test-patch

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Amar Kamat added a comment - Attaching a patch incorporating Devaraj's offline comments. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Devaraj Das added a comment -

        +1 on the patch. Please ensure that "ant test" goes through with the patch.
        BTW while reviewing this patch, I noticed that the JobHistory calls use the user's jobconf to create/read history file paths on both hadoop.job.history.location and hadoop.job.history.user.location. This should be fixed (in a separate jira) to use the JobTracker's conf for the history files on hadoop.job.history.location.

        Show
        Devaraj Das added a comment - +1 on the patch. Please ensure that "ant test" goes through with the patch. BTW while reviewing this patch, I noticed that the JobHistory calls use the user's jobconf to create/read history file paths on both hadoop.job.history.location and hadoop.job.history.user.location. This should be fixed (in a separate jira) to use the JobTracker's conf for the history files on hadoop.job.history.location.
        Hide
        Amar Kamat added a comment -

        Attaching a patch incorporating Devaraj's comments. Ant test passes on my box.

        Show
        Amar Kamat added a comment - Attaching a patch incorporating Devaraj's comments. Ant test passes on my box.
        Hide
        Devaraj Das added a comment -

        I just committed this to the 0.19, 0.20 branches and trunk. Thanks, Amar!

        Show
        Devaraj Das added a comment - I just committed this to the 0.19, 0.20 branches and trunk. Thanks, Amar!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/ )

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Amar Kamat
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development