|
Jothi, the reason you don't want to sort by MapId is
The one option is to schedule some failed fetches before the normal ones in ReduceCopier.fetchOutputs, however it is very real that Jetty on the task-tracker is stuck for a few seconds and then untangles itself, please keep that in mind too. Yes, we should randomize by the hosts. But for a given host we should sort it by the mapIDs to detect faults early enough (the comments above). The knownOutputs stucture today is a list. That might be done away with and instead a map from locations to MapIDs could be maintained (whenever we get a map completion event, we know the location anyway).
In order to protect against early or too aggressive killing, we should probably maintain the strategy of waiting for notifications from multiple reducers for all maps. Since the map failure notifications are sent only after a certain number of retries, we should be okay in protecting the maps against temporary network glitches. bq >In order to protect against early or too aggressive killing, we should probably maintain the strategy of waiting for notifications from multiple >reducers for all maps. Since the map failure notifications are sent only after a certain number of retries, we should be okay in protecting the >maps against temporary network glitches We should differentiate between the progress stage of the job. If there are a lot of unfinished mappers, then we should not do aggressive mapper re-executions. If reducers have a lot of un-fetched map outputs, they can wait for a longer period time before re-fetch the
Actually, we do not need to sort the MapIds for a given location. We just need to ensure that the order of map fetches is enforced – All reducers fetch the maps in the same order as any other. We could do this without sorting.
Yes. This was proposed as a solution in
The problem with this approach is that, if the map is faulty, it takes a long time to detect and then re-execute it. If the detection is done earlier, the map will likely finish re-execution by the time the reducer fetches other map outputs and the over all time is minimized. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384470/hadoop-3478.patch against trunk revision 670278. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2719/testReport/ This message is automatically generated. Patch after incorporating review comments
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384735/hadoop-3478-v1.patch against trunk revision 671563. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2743/console This message is automatically generated. Attaching the correct patch
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384741/hadoop-3478-v1.patch against trunk revision 671563. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2747/testReport/ This message is automatically generated. I just committed this. Thanks, Jothi!
Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HADOOP-3327.The current design for Map re-execution on fetch failures (in shuffle) is
to wait for 3 fetch failure notifications for a given map before
re-executing it.
For load balancing, the reducers randomize the order of maps to fetch.
If a fetch for a particular map fails, there is no guarantee that this
map would be attempted for fetch in the next iteration. What this implies
is that if a particular location is faulty, it might take a long time to
determine this and re-execute all the maps for this location.
For example, consider the simplest case of 1 reducer, fetching maps
M1,M2 and M3 from a single location L1. Assume that the location L1
developed a hardware failure after the map execution and so none of the
maps, M1,M2 or M3 is available.
The following could be the order of execution in the reducer
1. Try Fetching M1 from L1 --> Failure
2. Try Fetching M2 from L1 --> Failure
3. Try Fetching M3 from L1 --> Failure
==> At this point we have three fetch failures from Location L1, but
they are for different Maps. So, the map re-execution criteria is not met.
4. Try Fetching M1 from L1 --> Failure
5. Try Fetching M2 from L1 --> Failure
6. Try Fetching M3 from L1 --> Failure
==> Now, there are six failures from L1, but still the maps are not
re-executed.
7. Try Fetching M1 from L1 --> Failure
==> Now, M1 is re-executed
8. Try Fetching M2 from L1 --> Failure
==> Now, M2 is re-executed
9. Try Fetching M2 from L1 --> Failure
==> Now, M3 is re-executed
A better alternative could be to do this. Force the order in which maps
are executed in a particular location. A simple mechanism could be to sort
by Map Ids (for that location) and then fetch them in that order. An additional heuristic \
could be added to hasten the re-execution of maps if the number of failures
on a particular location exceeds some threshold (irrespective of the
number of fetch failures for that particular map id).
In the above example, let us assume that the sorted map Ids are in
the order M1,M2 and M3. Now, considering the above example
1. Try Fetching M1 from L1 --> Failure
2. Try Fetching M1 from L1 --> Failure
3. Try Fetching M1 from L1 --> Failure
==> Re-execute Map M1 now.
4. Try Fetching M2 from L1 --> Failure
5. Try Fetching M2 from L1 --> Failure
==> Assuming the threshold for max failures from a location is 5, M2 will
be re-executed now.
6. Try Fetching M3 from L1 --> Failure
Re-execute M3 now.