Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.18.1
-
None
-
None
-
hadoop-0.18.1 plus patches
HADOOP-4277HADOOP-4271HADOOP-4326HADOOP-4314HADOOP-3914HADOOP-4318HADOOP-4351HADOOP-4395
-
Reviewed
Description
2 attempts of a job using 6000 maps, 1900 reduces
1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of which became unresponsive due to an exception; dfs lost blocks
2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception; dfs lost blocks responsible for job failure.
I will post typical datanode exception and attach thread dump.