Hadoop Common
  1. Hadoop Common
  2. HADOOP-3478

The algorithm to decide map re-execution on fetch failures can be improved

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changed reducers to fetch maps in the same order for a given host to speed up identification of the faulty maps; reducers still randomize the host selection to distribute load.

      Description

      The algorithm to decide map re-execution on fetch failures can be improved.

      1. hadoop-3478.patch
        11 kB
        Jothi Padmanabhan
      2. hadoop-3478-v1.patch
        11 kB
        Jothi Padmanabhan

        Activity

        Hide
        Jothi Padmanabhan added a comment -

        This is related to HADOOP-3327.

        The current design for Map re-execution on fetch failures (in shuffle) is
        to wait for 3 fetch failure notifications for a given map before
        re-executing it.

        For load balancing, the reducers randomize the order of maps to fetch.
        If a fetch for a particular map fails, there is no guarantee that this
        map would be attempted for fetch in the next iteration. What this implies
        is that if a particular location is faulty, it might take a long time to
        determine this and re-execute all the maps for this location.

        For example, consider the simplest case of 1 reducer, fetching maps
        M1,M2 and M3 from a single location L1. Assume that the location L1
        developed a hardware failure after the map execution and so none of the
        maps, M1,M2 or M3 is available.

        The following could be the order of execution in the reducer

        1. Try Fetching M1 from L1 --> Failure
        2. Try Fetching M2 from L1 --> Failure
        3. Try Fetching M3 from L1 --> Failure

        ==> At this point we have three fetch failures from Location L1, but
        they are for different Maps. So, the map re-execution criteria is not met.

        4. Try Fetching M1 from L1 --> Failure
        5. Try Fetching M2 from L1 --> Failure
        6. Try Fetching M3 from L1 --> Failure

        ==> Now, there are six failures from L1, but still the maps are not
        re-executed.

        7. Try Fetching M1 from L1 --> Failure
        ==> Now, M1 is re-executed
        8. Try Fetching M2 from L1 --> Failure
        ==> Now, M2 is re-executed
        9. Try Fetching M2 from L1 --> Failure
        ==> Now, M3 is re-executed

        A better alternative could be to do this. Force the order in which maps
        are executed in a particular location. A simple mechanism could be to sort
        by Map Ids (for that location) and then fetch them in that order. An additional heuristic \
        could be added to hasten the re-execution of maps if the number of failures
        on a particular location exceeds some threshold (irrespective of the
        number of fetch failures for that particular map id).

        In the above example, let us assume that the sorted map Ids are in
        the order M1,M2 and M3. Now, considering the above example

        1. Try Fetching M1 from L1 --> Failure
        2. Try Fetching M1 from L1 --> Failure
        3. Try Fetching M1 from L1 --> Failure

        ==> Re-execute Map M1 now.

        4. Try Fetching M2 from L1 --> Failure
        5. Try Fetching M2 from L1 --> Failure

        ==> Assuming the threshold for max failures from a location is 5, M2 will
        be re-executed now.

        6. Try Fetching M3 from L1 --> Failure
        Re-execute M3 now.

        Show
        Jothi Padmanabhan added a comment - This is related to HADOOP-3327 . The current design for Map re-execution on fetch failures (in shuffle) is to wait for 3 fetch failure notifications for a given map before re-executing it. For load balancing, the reducers randomize the order of maps to fetch. If a fetch for a particular map fails, there is no guarantee that this map would be attempted for fetch in the next iteration. What this implies is that if a particular location is faulty, it might take a long time to determine this and re-execute all the maps for this location. For example, consider the simplest case of 1 reducer, fetching maps M1,M2 and M3 from a single location L1. Assume that the location L1 developed a hardware failure after the map execution and so none of the maps, M1,M2 or M3 is available. The following could be the order of execution in the reducer 1. Try Fetching M1 from L1 --> Failure 2. Try Fetching M2 from L1 --> Failure 3. Try Fetching M3 from L1 --> Failure ==> At this point we have three fetch failures from Location L1, but they are for different Maps. So, the map re-execution criteria is not met. 4. Try Fetching M1 from L1 --> Failure 5. Try Fetching M2 from L1 --> Failure 6. Try Fetching M3 from L1 --> Failure ==> Now, there are six failures from L1, but still the maps are not re-executed. 7. Try Fetching M1 from L1 --> Failure ==> Now, M1 is re-executed 8. Try Fetching M2 from L1 --> Failure ==> Now, M2 is re-executed 9. Try Fetching M2 from L1 --> Failure ==> Now, M3 is re-executed A better alternative could be to do this. Force the order in which maps are executed in a particular location. A simple mechanism could be to sort by Map Ids (for that location) and then fetch them in that order. An additional heuristic \ could be added to hasten the re-execution of maps if the number of failures on a particular location exceeds some threshold (irrespective of the number of fetch failures for that particular map id). In the above example, let us assume that the sorted map Ids are in the order M1,M2 and M3. Now, considering the above example 1. Try Fetching M1 from L1 --> Failure 2. Try Fetching M1 from L1 --> Failure 3. Try Fetching M1 from L1 --> Failure ==> Re-execute Map M1 now. 4. Try Fetching M2 from L1 --> Failure 5. Try Fetching M2 from L1 --> Failure ==> Assuming the threshold for max failures from a location is 5, M2 will be re-executed now. 6. Try Fetching M3 from L1 --> Failure Re-execute M3 now.
        Hide
        Arun C Murthy added a comment -

        Jothi, the reason you don't want to sort by MapId is HADOOP-1270. Given that it is quite hard to sort by MapId for a given location.

        The one option is to schedule some failed fetches before the normal ones in ReduceCopier.fetchOutputs, however it is very real that Jetty on the task-tracker is stuck for a few seconds and then untangles itself, please keep that in mind too.

        Show
        Arun C Murthy added a comment - Jothi, the reason you don't want to sort by MapId is HADOOP-1270 . Given that it is quite hard to sort by MapId for a given location. The one option is to schedule some failed fetches before the normal ones in ReduceCopier.fetchOutputs , however it is very real that Jetty on the task-tracker is stuck for a few seconds and then untangles itself, please keep that in mind too.
        Hide
        Devaraj Das added a comment -

        Yes, we should randomize by the hosts. But for a given host we should sort it by the mapIDs to detect faults early enough (the comments above). The knownOutputs stucture today is a list. That might be done away with and instead a map from locations to MapIDs could be maintained (whenever we get a map completion event, we know the location anyway).
        In order to protect against early or too aggressive killing, we should probably maintain the strategy of waiting for notifications from multiple reducers for all maps. Since the map failure notifications are sent only after a certain number of retries, we should be okay in protecting the maps against temporary network glitches.

        Show
        Devaraj Das added a comment - Yes, we should randomize by the hosts . But for a given host we should sort it by the mapIDs to detect faults early enough (the comments above). The knownOutputs stucture today is a list. That might be done away with and instead a map from locations to MapIDs could be maintained (whenever we get a map completion event, we know the location anyway). In order to protect against early or too aggressive killing, we should probably maintain the strategy of waiting for notifications from multiple reducers for all maps. Since the map failure notifications are sent only after a certain number of retries, we should be okay in protecting the maps against temporary network glitches.
        Hide
        Runping Qi added a comment -

        bq >In order to protect against early or too aggressive killing, we should probably maintain the strategy of waiting for notifications from multiple >reducers for all maps. Since the map failure notifications are sent only after a certain number of retries, we should be okay in protecting the >maps against temporary network glitches

        We should differentiate between the progress stage of the job.

        If there are a lot of unfinished mappers, then we should not do aggressive mapper re-executions.

        If reducers have a lot of un-fetched map outputs, they can wait for a longer period time before re-fetch the
        map outputs that failed to fetcher previously. However, if one or more reducers are waiting for one or a few map-outputs,
        then the reducers should re-try aggressively, and if fail persists, the mappers should be re-executed aggressively.

        Show
        Runping Qi added a comment - bq >In order to protect against early or too aggressive killing, we should probably maintain the strategy of waiting for notifications from multiple >reducers for all maps. Since the map failure notifications are sent only after a certain number of retries, we should be okay in protecting the >maps against temporary network glitches We should differentiate between the progress stage of the job. If there are a lot of unfinished mappers, then we should not do aggressive mapper re-executions. If reducers have a lot of un-fetched map outputs, they can wait for a longer period time before re-fetch the map outputs that failed to fetcher previously. However, if one or more reducers are waiting for one or a few map-outputs, then the reducers should re-try aggressively, and if fail persists, the mappers should be re-executed aggressively.
        Hide
        Jothi Padmanabhan added a comment -

        Given that it is quite hard to sort by MapId for a given location

        Actually, we do not need to sort the MapIds for a given location. We just need to ensure that the order of map fetches is enforced – All reducers fetch the maps in the same order as any other. We could do this without sorting.

        If there are a lot of unfinished mappers, then we should not do aggressive mapper re-executions.

        Yes. This was proposed as a solution in HADOOP-3327. The JobTracker should use the number of unfinished mappers as one of the criteria for deciding whether maps should be re-executed aggressively or not.

        If reducers have a lot of un-fetched map outputs, they can wait for a longer period time before re-fetch the map outputs that failed to fetcher previously.

        The problem with this approach is that, if the map is faulty, it takes a long time to detect and then re-execute it. If the detection is done earlier, the map will likely finish re-execution by the time the reducer fetches other map outputs and the over all time is minimized.

        Show
        Jothi Padmanabhan added a comment - Given that it is quite hard to sort by MapId for a given location Actually, we do not need to sort the MapIds for a given location. We just need to ensure that the order of map fetches is enforced – All reducers fetch the maps in the same order as any other. We could do this without sorting. If there are a lot of unfinished mappers, then we should not do aggressive mapper re-executions. Yes. This was proposed as a solution in HADOOP-3327 . The JobTracker should use the number of unfinished mappers as one of the criteria for deciding whether maps should be re-executed aggressively or not. If reducers have a lot of un-fetched map outputs, they can wait for a longer period time before re-fetch the map outputs that failed to fetcher previously. The problem with this approach is that, if the map is faulty, it takes a long time to detect and then re-execute it. If the detection is done earlier, the map will likely finish re-execution by the time the reducer fetches other map outputs and the over all time is minimized.
        Hide
        Jothi Padmanabhan added a comment -

        Patch for review

        Show
        Jothi Padmanabhan added a comment - Patch for review
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384470/hadoop-3478.patch
        against trunk revision 670278.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384470/hadoop-3478.patch against trunk revision 670278. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Patch after incorporating review comments

        Show
        Jothi Padmanabhan added a comment - Patch after incorporating review comments
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384735/hadoop-3478-v1.patch
        against trunk revision 671563.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2743/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384735/hadoop-3478-v1.patch against trunk revision 671563. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2743/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Attaching the correct patch

        Show
        Jothi Padmanabhan added a comment - Attaching the correct patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384741/hadoop-3478-v1.patch
        against trunk revision 671563.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384741/hadoop-3478-v1.patch against trunk revision 671563. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Jothi!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Jothi!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )

          People

          • Assignee:
            Jothi Padmanabhan
            Reporter:
            Jothi Padmanabhan
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development