Hadoop Common
  1. Hadoop Common
  2. HADOOP-4638

Exception thrown in/from RecoveryManager.recover() should be caught and handled

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.19.2, 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      RecoveryManager.recover() can throw an exception while recovering a job. Since the JobTracker calls RecoveryManager.recover() from offerService(), any failure in recovery will cause JobTracker to crash. Ideally the RecoveryManager should log the failure encountered while recovering the job and continue.

      1. HADOOP-4638-v1.8.5.patch
        12 kB
        Amar Kamat
      2. HADOOP-4638-v1.8.patch
        14 kB
        Amar Kamat
      3. HADOOP-4638-v1.6.patch
        13 kB
        Amar Kamat
      4. HADOOP-4638-v1.3.patch
        13 kB
        Amar Kamat
      5. HADOOP-4638-v1.1.patch
        12 kB
        Amar Kamat

        Activity

        Amar Kamat created issue -
        Hide
        Amar Kamat added a comment -

        Attaching a patch that prevents RecoveryManager from taking down the jobtracker. Added a testcase to test it.

        Show
        Amar Kamat added a comment - Attaching a patch that prevents RecoveryManager from taking down the jobtracker. Added a testcase to test it.
        Amar Kamat made changes -
        Field Original Value New Value
        Attachment HADOOP-4638-v1.1.patch [ 12393796 ]
        Amar Kamat made changes -
        Assignee Amar Kamat [ amar_kamat ]
        Hide
        Amar Kamat added a comment -

        Attaching a new patch.

        Show
        Amar Kamat added a comment - Attaching a new patch.
        Amar Kamat made changes -
        Attachment HADOOP-4638-v1.3.patch [ 12400275 ]
        Hide
        Amar Kamat added a comment -

        Result of test-patch

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        
        Show
        Amar Kamat added a comment - Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Amar Kamat made changes -
        Attachment HADOOP-4638-v1.6.patch [ 12400852 ]
        Amar Kamat made changes -
        Priority Major [ 3 ] Blocker [ 1 ]
        Affects Version/s 0.20.0 [ 12313438 ]
        Hide
        Amar Kamat added a comment -

        Attaching a new patch. Result of test-patch

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        

        Ant test passes on my box.

        Show
        Amar Kamat added a comment - Attaching a new patch. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Ant test passes on my box.
        Amar Kamat made changes -
        Attachment HADOOP-4638-v1.8.patch [ 12401009 ]
        Hemanth Yamijala made changes -
        Fix Version/s 0.20.0 [ 12313438 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        If the JobTracker fails to recover a job, it should do a job.kill() (essentially kill the job). This will kill all the tips and do a finalizeJob(). Then patch doesnt need the change to do with updateTaskStatuses.

        Show
        Amareshwari Sriramadasu added a comment - If the JobTracker fails to recover a job, it should do a job.kill() (essentially kill the job). This will kill all the tips and do a finalizeJob(). Then patch doesnt need the change to do with updateTaskStatuses.
        Hide
        Amar Kamat added a comment -

        Amareshwari,
        The job gets ignored before it gets added to jobtracker (i.e if the filename is not recoverable or restoration of master file fails). Once the filename is recovered, the recovery manager recovers whatever it can and continues. There is no killing done after that. Hence there is no need to do a job.kill().

        Show
        Amar Kamat added a comment - Amareshwari, The job gets ignored before it gets added to jobtracker (i.e if the filename is not recoverable or restoration of master file fails). Once the filename is recovered, the recovery manager recovers whatever it can and continues. There is no killing done after that. Hence there is no need to do a job.kill().
        Hide
        Amar Kamat added a comment -

        The change in updateTaskStatuses takes care of the case where the job which was running in the earlier jobtracker goes missing/undetected during recovery in the new jobtracker. Hence every job that is missing in the jobtracker should get removed from the tasktracker. Hence the change is required.

        Show
        Amar Kamat added a comment - The change in updateTaskStatuses takes care of the case where the job which was running in the earlier jobtracker goes missing/undetected during recovery in the new jobtracker. Hence every job that is missing in the jobtracker should get removed from the tasktracker. Hence the change is required.
        Hide
        Amareshwari Sriramadasu added a comment -

        Framework changes look good.
        Some comments in Testcase:
        1. testJobTracker() does not have any assertion in the test.
        2. Comments for testRecoveryManager() look different from implementation. Also can you add the assertion that Job1 is ignored and Job2 succeeded?

        Show
        Amareshwari Sriramadasu added a comment - Framework changes look good. Some comments in Testcase: 1. testJobTracker() does not have any assertion in the test. 2. Comments for testRecoveryManager() look different from implementation. Also can you add the assertion that Job1 is ignored and Job2 succeeded?
        Hide
        Amar Kamat added a comment -

        Incorporated Amareshwari's comments. Result of test-patch

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Amar Kamat added a comment - Incorporated Amareshwari's comments. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Amar Kamat made changes -
        Attachment HADOOP-4638-v1.8.5.patch [ 12401235 ]
        Amar Kamat made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        +1 patch looks good.

        Show
        Amareshwari Sriramadasu added a comment - +1 patch looks good.
        Hide
        Hemanth Yamijala added a comment -

        I committed this to trunk and the 0.19 and 0.20 branches. Thanks, Amar !

        Show
        Hemanth Yamijala added a comment - I committed this to trunk and the 0.19 and 0.20 branches. Thanks, Amar !
        Hemanth Yamijala made changes -
        Fix Version/s 0.19.2 [ 12313650 ]
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #778 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/778/ )
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        110d 2h 51m 1 Amar Kamat 02/Mar/09 12:11
        Patch Available Patch Available Resolved Resolved
        2h 36m 1 Hemanth Yamijala 02/Mar/09 14:48
        Resolved Resolved Closed Closed
        52d 4h 29m 1 Nigel Daley 23/Apr/09 20:17

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Amar Kamat
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development