Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-108

Blacklisted hosts may not be able to serve map outputs

    Details

    • Type: Bug Bug
    • Status: In Progress
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      After a node fails 4 mappers (tasks), it is added to blacklist thus it will no longer accept tasks.
      But, it will continue serve the map outputs of any mappers that ran successfully there.
      However, the node may not be able serve the map outputs either.
      This will cause the reducers to mark the corresponding map outputs as from slow hosts,
      but continue to try to get the map outputs from that node.
      This may lead to waiting forever.

      1. HADOOP-2175-v1.1.patch
        14 kB
        Amar Kamat
      2. HADOOP-2175-v1.patch
        2 kB
        Amar Kamat
      3. HADOOP-2175-v2.patch
        11 kB
        Amar Kamat
      4. HADOOP-2175-v2.patch
        11 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          Runping, the work we did on HADOOP-1158 should take care of situations like this one, no?

          If you are satisfied, I'll close this issue...

          Show
          Arun C Murthy added a comment - Runping, the work we did on HADOOP-1158 should take care of situations like this one, no? If you are satisfied, I'll close this issue...
          Hide
          Runping Qi added a comment -

          I was using hadoop 0.15.0.

          is the patch for hadoop-1158 in 0.15.0?
          if so, the problem is not solved.

          Show
          Runping Qi added a comment - I was using hadoop 0.15.0. is the patch for hadoop-1158 in 0.15.0? if so, the problem is not solved.
          Hide
          Arun C Murthy added a comment -

          Yes, HADOOP-1158 went into 0.15.0.

          Unfortunately with the way the back-offs are structured (waiting period between re-tries) it takes a little while for the map-task on the blacklisted node to get killed. Could you wait for a bit and see if the map got killed?

          The back-offs are fixed by HADOOP-1984 and they should help enormously here...

          Show
          Arun C Murthy added a comment - Yes, HADOOP-1158 went into 0.15.0. Unfortunately with the way the back-offs are structured (waiting period between re-tries) it takes a little while for the map-task on the blacklisted node to get killed. Could you wait for a bit and see if the map got killed? The back-offs are fixed by HADOOP-1984 and they should help enormously here...
          Hide
          Runping Qi added a comment -

          then we need to try the patch for hadoop-1984.
          Currently, a job may stall forever (I waited 30+ minutes, and eventually killed the job).

          Show
          Runping Qi added a comment - then we need to try the patch for hadoop-1984. Currently, a job may stall forever (I waited 30+ minutes, and eventually killed the job).
          Hide
          Runping Qi added a comment -

          The criteria for starting a speculative execution should also include
          how much/little progress has been made in the past M minutes.
          If a task is detected to be stalled, a speculative execution should be
          started, no matter how close the original one is to completion.

          Show
          Runping Qi added a comment - The criteria for starting a speculative execution should also include how much/little progress has been made in the past M minutes. If a task is detected to be stalled, a speculative execution should be started, no matter how close the original one is to completion.
          Hide
          Amar Kamat added a comment -

          HADOOP-1984 will cause more waiting here. Since the time for sending the fetch failures will be more. The simplest solution seems to reschedule all the maps on the black listed node. The question is can we do better? Can the JT infer and handle this situation?

          Show
          Amar Kamat added a comment - HADOOP-1984 will cause more waiting here. Since the time for sending the fetch failures will be more. The simplest solution seems to reschedule all the maps on the black listed node. The question is can we do better? Can the JT infer and handle this situation?
          Hide
          Runping Qi added a comment -

          Re-executing the mappers will likely be the correct action in most cases.
          Even for some cases where this is not optimal, its cost will not be that expensive.
          Thus, I think that is the right behavior, before comebody comes up with a simple and effective alternative

          Show
          Runping Qi added a comment - Re-executing the mappers will likely be the correct action in most cases. Even for some cases where this is not optimal, its cost will not be that expensive. Thus, I think that is the right behavior, before comebody comes up with a simple and effective alternative
          Hide
          Devaraj Das added a comment -

          Re-executing the mappers will likely be the correct action in most cases.

          +1

          Show
          Devaraj Das added a comment - Re-executing the mappers will likely be the correct action in most cases. +1
          Hide
          Amar Kamat added a comment -

          Attaching a patch that marks blacklisted trackers as lost. Since FAILED and KILLED tasks have associated counts, this patch ignores these tasks. RUNNING and SUCCEEDED tasks are re-scheduled. If a task is FAILED its still considered FAILED because
          1) There are counts associated with it and is used for killing a TIP
          2) Changing the state causes inconsistency between the failed-task counts and actual number of failed-tasks.

          Show
          Amar Kamat added a comment - Attaching a patch that marks blacklisted trackers as lost. Since FAILED and KILLED tasks have associated counts, this patch ignores these tasks. RUNNING and SUCCEEDED tasks are re-scheduled. If a task is FAILED its still considered FAILED because 1) There are counts associated with it and is used for killing a TIP 2) Changing the state causes inconsistency between the failed-task counts and actual number of failed-tasks.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12377928/HADOOP-2175-v1.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377928/HADOOP-2175-v1.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1979/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Code looks good

          Show
          Amareshwari Sriramadasu added a comment - Code looks good
          Hide
          Amar Kamat added a comment -

          Submitting a patch with a test case.

          Show
          Amar Kamat added a comment - Submitting a patch with a test case.
          Hide
          Devaraj Das added a comment -

          I am not clear why you have the check in JobInProgress for doing lostTaskTracker outside the addTrackerTaskFailure. You could do the check inside the method, right?
          Also, inside lostTaskTracker you check for whether the task was already FAILED/KILLED. Do you need to do the check for KILLED?
          On the change to MiniMRCluster, I am not convinced that this is the right thing to do (wait for 10 seconds and then giving up).
          On the TestLostBlackListedTracker, i don't think you need to make it that complicated. A simple dummy split based map should work. In that case you don't have to change TestRackAwareTaskPlacement. The way you get events is also not very reliable w.r.t timing. In the first call to getTaskCompletionEvents, you might get events.length = 0. Isn't this a problem. I'd say that you wait for the job to complete and then get the events.

          Show
          Devaraj Das added a comment - I am not clear why you have the check in JobInProgress for doing lostTaskTracker outside the addTrackerTaskFailure. You could do the check inside the method, right? Also, inside lostTaskTracker you check for whether the task was already FAILED/KILLED. Do you need to do the check for KILLED? On the change to MiniMRCluster, I am not convinced that this is the right thing to do (wait for 10 seconds and then giving up). On the TestLostBlackListedTracker, i don't think you need to make it that complicated. A simple dummy split based map should work. In that case you don't have to change TestRackAwareTaskPlacement. The way you get events is also not very reliable w.r.t timing. In the first call to getTaskCompletionEvents, you might get events.length = 0. Isn't this a problem. I'd say that you wait for the job to complete and then get the events.
          Hide
          Amar Kamat added a comment -

          am not clear why you have the check in JobInProgress for doing lostTaskTracker outside the addTrackerTaskFailure.

          +1

          Also, inside lostTaskTracker you check for whether the task was already FAILED/KILLED

          I did that because the TIP failed/killed before the TT got lost, should be kept failed/kiiled. There is no need to reschedule or change their status. Since the task was not killed because of lost TT, I ignored it.

          On the change to MiniMRCluster ....

          I think there is a problem with MiniMRCluster w.r.t lost TTs. It keeps on trying for the TT to be idle and eventually the test times out. I am still trying to find out why the MiniMR gets stuck.

          On the TestLostBlackListedTracker, i don't think you need to make it that complicated

          +1.

          The way you get events is also not very reliable w.r.t timing. In the first call to getTaskCompletionEvents, you might get events.length = 0

          I use launchJob which waits for the job to complete. Its a blocking call.

          Show
          Amar Kamat added a comment - am not clear why you have the check in JobInProgress for doing lostTaskTracker outside the addTrackerTaskFailure. +1 Also, inside lostTaskTracker you check for whether the task was already FAILED/KILLED I did that because the TIP failed/killed before the TT got lost, should be kept failed/kiiled. There is no need to reschedule or change their status. Since the task was not killed because of lost TT, I ignored it. On the change to MiniMRCluster .... I think there is a problem with MiniMRCluster w.r.t lost TTs. It keeps on trying for the TT to be idle and eventually the test times out. I am still trying to find out why the MiniMR gets stuck. On the TestLostBlackListedTracker, i don't think you need to make it that complicated +1. The way you get events is also not very reliable w.r.t timing. In the first call to getTaskCompletionEvents, you might get events.length = 0 I use launchJob which waits for the job to complete. Its a blocking call.
          Hide
          Amar Kamat added a comment -

          This patch incorporates Devaraj's comments. Changes are as follows

          • The call to lostTaskTracker now expects a reason/info for why/how the tracker is lost.
          • Losing a blacklisted tracker happens in addTrackerFailure
          • Before losing a tracker a check is made if the tracker exists in the JT and also the status is updated so that in the next heartbeat cycle the TT gets reinitialized.
          • The test case now doesn't depend on timing.
          • The only changes in MiniMRCluster is to do with default conf for JT/TT.
          Show
          Amar Kamat added a comment - This patch incorporates Devaraj's comments. Changes are as follows The call to lostTaskTracker now expects a reason/info for why/how the tracker is lost. Losing a blacklisted tracker happens in addTrackerFailure Before losing a tracker a check is made if the tracker exists in the JT and also the status is updated so that in the next heartbeat cycle the TT gets reinitialized. The test case now doesn't depend on timing. The only changes in MiniMRCluster is to do with default conf for JT/TT.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378538/HADOOP-2175-v2.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 7 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378538/HADOOP-2175-v2.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 7 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2043/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          The log messages in the earlier patch were different from the ones in the trunk. Changing the log messages (rest of the patch remains same).

          Show
          Amar Kamat added a comment - The log messages in the earlier patch were different from the ones in the trunk. Changing the log messages (rest of the patch remains same).
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378549/HADOOP-2175-v2.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 7 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378549/HADOOP-2175-v2.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 7 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2047/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Come to think about it, lostTaskTracker may not be the best way to go since it can potentially affect tasks from multiple jobs. We probably need to make the lostTaskTracker take a JobInProgress argument and do failedTask for tasks of that job only. The other option is to implement APIs that gives the TIPs and taskIds corresponding to a job & tasktracker combination, and then invoke failedTask in the JobInProgress for each TIP/taskId. The second approach seems cleaner generally but for the first approach most of the necessary infrastructure is already there. Thoughts?

          Show
          Devaraj Das added a comment - Come to think about it, lostTaskTracker may not be the best way to go since it can potentially affect tasks from multiple jobs. We probably need to make the lostTaskTracker take a JobInProgress argument and do failedTask for tasks of that job only. The other option is to implement APIs that gives the TIPs and taskIds corresponding to a job & tasktracker combination, and then invoke failedTask in the JobInProgress for each TIP/taskId. The second approach seems cleaner generally but for the first approach most of the necessary infrastructure is already there. Thoughts?
          Hide
          Amar Kamat added a comment -

          After some offline discussions with some folks here this is what seems reasonable: kill the map on a per map basis and tweak the logic of killing maps due to "too many fetch failures" that currently depends on notifications from all running reducers, to just one notification if the tracker in question has been blacklisted. That way we will not be too aggressive (we don't kill too many maps in one go) and we will be harsh with the map corresponding to the failed fetch.. Thoughts?

          Show
          Amar Kamat added a comment - After some offline discussions with some folks here this is what seems reasonable: kill the map on a per map basis and tweak the logic of killing maps due to "too many fetch failures" that currently depends on notifications from all running reducers, to just one notification if the tracker in question has been blacklisted. That way we will not be too aggressive (we don't kill too many maps in one go) and we will be harsh with the map corresponding to the failed fetch.. Thoughts?
          Hide
          Runping Qi added a comment -

          +1

          Show
          Runping Qi added a comment - +1
          Hide
          Amar Kamat added a comment -

          The only concern is when all the maps that are yet to be fetched are from the same blacklisted tracker. The reason being that each of the reducer will fetch one map per host. Hence killing all the maps will take
          5min * num-maps-on-tracker/num-reducers in the best case and 5min * num-maps-on-tracker in the worst case assuming default config.
          Following are some of the tweaks
          1) Keep track of the total failures registered against the tracker (per job) and kill all the maps for a job if the total failures for a job is greater than 25% .
          2) Keep a set of unique hosts per job that have registered against a blacklisted tracker and kill all the maps for a job if all the reducers have complained against the blacklisted tracker.
          Currently we do similar stuff for killing a map based on fetch failures. We should do something similar in case of trackers i.e re-schedule all the maps (per job maybe) in case of blacklisted trackers. In future we may relax the condition of the tracker being blacklisted. Thoughts?

          Show
          Amar Kamat added a comment - The only concern is when all the maps that are yet to be fetched are from the same blacklisted tracker. The reason being that each of the reducer will fetch one map per host. Hence killing all the maps will take 5min * num-maps-on-tracker/num-reducers in the best case and 5min * num-maps-on-tracker in the worst case assuming default config. Following are some of the tweaks 1) Keep track of the total failures registered against the tracker (per job) and kill all the maps for a job if the total failures for a job is greater than 25% . 2) Keep a set of unique hosts per job that have registered against a blacklisted tracker and kill all the maps for a job if all the reducers have complained against the blacklisted tracker. Currently we do similar stuff for killing a map based on fetch failures. We should do something similar in case of trackers i.e re-schedule all the maps (per job maybe) in case of blacklisted trackers. In future we may relax the condition of the tracker being blacklisted. Thoughts?
          Hide
          Runping Qi added a comment -

          will take 5min * num-maps-on-tracker/num-reducers in the best case and 5min * num-maps-on-tracker in the worst case assuming default config.

          why will it take so long to fetch the map output?
          why fetch only one map per host?

          Show
          Runping Qi added a comment - will take 5min * num-maps-on-tracker/num-reducers in the best case and 5min * num-maps-on-tracker in the worst case assuming default config. why will it take so long to fetch the map output? why fetch only one map per host?
          Hide
          Amar Kamat added a comment -

          In the default case it will take atleast 5min to send a notification. The reducer code is such that only one map task per host is tried at a time. It will try a new map task only if the earlier one succeeds/fails.

          Show
          Amar Kamat added a comment - In the default case it will take atleast 5min to send a notification. The reducer code is such that only one map task per host is tried at a time. It will try a new map task only if the earlier one succeeds/fails.
          Hide
          Sameer Paranjpye added a comment -

          Let's not confuse lost and blacklisted tasktrackers. A lost tasktracker is one that doesn't check in with the JT and a tasktracker blacklisted for a job is one that causes tasks to fail for that job and they need to be handled very differently.

          We should move this issue to 0.18. We don't have a coherent model for task failures, the blacklisting logic is already messy and adding half a dozen if statements will only make it messier.

          Show
          Sameer Paranjpye added a comment - Let's not confuse lost and blacklisted tasktrackers. A lost tasktracker is one that doesn't check in with the JT and a tasktracker blacklisted for a job is one that causes tasks to fail for that job and they need to be handled very differently. We should move this issue to 0.18. We don't have a coherent model for task failures, the blacklisting logic is already messy and adding half a dozen if statements will only make it messier.
          Hide
          Devaraj Das added a comment -

          I agree with Sameer. We should probably step back and look at the model of killing a map based on fetch failure notifications. Today, we do killing of maps based on fetch failure notifications on a per map basis and we wait for a majority of the reducers to tell the JobTracker about the fetch failing for a particular map.
          With the random ordering of map output fetches and the backoff per failed fetch, this might take a long time per map. This is what you observed Runping, IMO.
          Instead we probably should include the tracker name on which map ran in the logic for killing a map - if we get too many fetch failure notifications for maps that ran on a particular tracker, which we will detect much faster, we should probably kill those maps that ran on that tracker, for which we are seeing fetch failure notifications. That will take care of the case where only the jetty is faulty (the tracker is not blacklisted as it could, and probably still can, execute tasks).

          Show
          Devaraj Das added a comment - I agree with Sameer. We should probably step back and look at the model of killing a map based on fetch failure notifications. Today, we do killing of maps based on fetch failure notifications on a per map basis and we wait for a majority of the reducers to tell the JobTracker about the fetch failing for a particular map. With the random ordering of map output fetches and the backoff per failed fetch, this might take a long time per map. This is what you observed Runping, IMO. Instead we probably should include the tracker name on which map ran in the logic for killing a map - if we get too many fetch failure notifications for maps that ran on a particular tracker, which we will detect much faster, we should probably kill those maps that ran on that tracker, for which we are seeing fetch failure notifications. That will take care of the case where only the jetty is faulty (the tracker is not blacklisted as it could, and probably still can, execute tasks).
          Hide
          Amar Kamat added a comment -

          Needs further investigation.

          Show
          Amar Kamat added a comment - Needs further investigation.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Runping Qi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development