Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.6.2
-
None
-
None
Description
In ReduceTaskRunner, main loop sending heartbeats waits on copyResults, which releases only if a copy thread finishes copying. This can cause good reduce tasks which are copying data to fail, if no map task output was copied within "mapred.task.timeout".
ReduceTaskRunner.java:490
try
catch (InterruptedException e) { }
wait() should be with a timeout, possibly taskTimeout/2 after which it should send a hearbeat and go back to wait.