When I tried to simulate a read timeout for one of the attempts on a TaskTracker, I hit few bugs with new shuffle code.
1. Connection logic is wrong as the jira description says. So, both connect timeout and read timeout are treated as read timeout now.
Solution: Separate connect() and getInputStream() into different blocks as in pre-0.21.
2. If there is connect or read timeout during draining the input stream of the connection, the reducer fails.
Solution : If there is any timeout here, we should ignore and let the reducer run.
3. If a tracker ran map tasks 1, 2, 3 and tracker times out reading the output of 2nd task, the reducer penalizes the 1st task. This is because the reducer tries to read all the three map outputs in one connection.
Solution: After reading every map output, we should do a flush() so that the correct map will be penalized for any error.
4. If a successful task is marked as FAILED because of "Too many fetch failures", the task is not removed from reduce.Fetcher's known pending maps. So, reducer will indefinitely try to read the FAILED map.
Solution: EventFetcher does not add FAILED/KILLED TaskCompletionEvents to obsolete maps. We should add them also as we did in pre-0.21.
Attached patch fixes all the above bugs.