-
Type:
Bug
-
Status: Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 0.18.1
-
Fix Version/s: 0.18.2
-
Component/s: None
-
Labels:None
-
Environment:
hadoop-0.18.1 plus patches
HADOOP-4277HADOOP-4271HADOOP-4326HADOOP-4314HADOOP-3914HADOOP-4318HADOOP-4351HADOOP-4395
-
Hadoop Flags:Reviewed
2 attempts of a job using 6000 maps, 1900 reduces
1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of which became unresponsive due to an exception; dfs lost blocks
2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception; dfs lost blocks responsible for job failure.
I will post typical datanode exception and attach thread dump.