Hadoop Common
  1. Hadoop Common
  2. HADOOP-5801

JobTracker should refresh the hosts list upon recovery

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the hosts file is changes across restart then it should be refreshed upon recovery so that the excluded hosts are lost and the maps are re-executed.
      The mapred-hosts list can change across restarts. Once the jobtracker recovers, it detects all the tasktracker there are there in the history. If the hosts list is changed, then the jobtracker will still have the tasktracker (data) internally but will disallow the tracker when it contacts. As a result, the jobtracker will have to wait for the tracker to timeout in order to re-execute the tasks. This patch simply refreshes the node list upon recovery so that the invalid trackers are lost immediately.

      1. 5801-0.20.patch
        5 kB
        Robert Chansler
      2. HADOOP-5801-v1.2.patch
        5 kB
        Amar Kamat

        Activity

        Amar Kamat created issue -
        Hide
        Amar Kamat added a comment -

        Attaching a patch that refreshes after the recovery is done. Result of test-patch

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        

        Running ant test now.

        Show
        Amar Kamat added a comment - Attaching a patch that refreshes after the recovery is done. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant test now.
        Amar Kamat made changes -
        Field Original Value New Value
        Attachment HADOOP-5801-v1.2.patch [ 12407843 ]
        Hide
        Amar Kamat added a comment -

        ant tests passed on my box.

        Show
        Amar Kamat added a comment - ant tests passed on my box.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amar!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amar!
        Devaraj Das made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.21.0 [ 12313563 ]
        Resolution Fixed [ 1 ]
        Hide
        Robert Chansler added a comment -

        Attached example for 0.20 not to be committed.

        Show
        Robert Chansler added a comment - Attached example for 0.20 not to be committed.
        Robert Chansler made changes -
        Attachment 5801-0.20.patch [ 12409834 ]
        Amar Kamat made changes -
        Release Note The mapred-hosts list can change across restarts. Once the jobtracker recovers, it detects all the tasktracker there are there in the history. If the hosts list is changed, then the jobtracker will still have the tasktracker (data) internally but will disallow the tracker when it contacts. As a result, the jobtracker will have to wait for the tracker to timeout in order to re-execute the tasks. This patch simply refreshes the node list upon recovery so that the invalid trackers are lost immediately.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/ )
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]
        Hide
        Robert Chansler added a comment -

        Editorial pass over all release notes prior to publication of 0.21. Routine. Bug.

        Show
        Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21. Routine. Bug.
        Robert Chansler made changes -
        Release Note The mapred-hosts list can change across restarts. Once the jobtracker recovers, it detects all the tasktracker there are there in the history. If the hosts list is changed, then the jobtracker will still have the tasktracker (data) internally but will disallow the tracker when it contacts. As a result, the jobtracker will have to wait for the tracker to timeout in order to re-execute the tasks. This patch simply refreshes the node list upon recovery so that the invalid trackers are lost immediately.
        Description If the hosts file is changes across restart then it should be refreshed upon recovery so that the excluded hosts are lost and the maps are re-executed. If the hosts file is changes across restart then it should be refreshed upon recovery so that the excluded hosts are lost and the maps are re-executed.
        The mapred-hosts list can change across restarts. Once the jobtracker recovers, it detects all the tasktracker there are there in the history. If the hosts list is changed, then the jobtracker will still have the tasktracker (data) internally but will disallow the tracker when it contacts. As a result, the jobtracker will have to wait for the tracker to timeout in order to re-execute the tasks. This patch simply refreshes the node list upon recovery so that the invalid trackers are lost immediately.
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        8d 15h 35m 1 Devaraj Das 20/May/09 05:25
        Resolved Resolved Closed Closed
        461d 16h 11m 1 Tom White 24/Aug/10 21:37

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Amar Kamat
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development