[HADOOP-4517] unstable dfs when running jobs on 0.18.1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.18.1
Fix Version/s: 0.18.2
Component/s: None
Labels:
None
Environment:

Hide

hadoop-0.18.1 plus patches ~~HADOOP-4277~~ ~~HADOOP-4271~~ ~~HADOOP-4326~~ ~~HADOOP-4314~~ ~~HADOOP-3914~~ ~~HADOOP-4318~~ ~~HADOOP-4351~~ ~~HADOOP-4395~~

Show
hadoop-0.18.1 plus patches HADOOP-4277 HADOOP-4271 HADOOP-4326 HADOOP-4314 HADOOP-3914 HADOOP-4318 HADOOP-4351 HADOOP-4395

Hadoop Flags:

Reviewed

Description

2 attempts of a job using 6000 maps, 1900 reduces

1.st attempt: failed during reduce phase after 22 hours with 31 dead datanodes most of which became unresponsive due to an exception; dfs lost blocks
2nd attempt: failed during map phase after 5 hours with 5 dead datanodes due to exception; dfs lost blocks responsible for job failure.

I will post typical datanode exception and attach thread dump.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

4517_20081027_0.18.patch
27/Oct/08 20:43
4 kB
Tsz-wo Sze
4517_20081027.patch
27/Oct/08 19:02
4 kB
Tsz-wo Sze
4517_20081024d.patch
25/Oct/08 17:27
3 kB
Tsz-wo Sze
4517_20081024d_0.18.patch
25/Oct/08 01:44
3 kB
Tsz-wo Sze
4517_20081024c_0.18.patch
25/Oct/08 00:26
3 kB
Tsz-wo Sze
4517_20081024b_0.18.patch
24/Oct/08 22:27
3 kB
Tsz-wo Sze
4517_20081024.patch
24/Oct/08 22:01
3 kB
Tsz-wo Sze
datanode.out
24/Oct/08 19:18
301 kB
Christian Kunz

Activity

People

Assignee:: Tsz-wo Sze

Reporter:: Christian Kunz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Oct/08 19:16

Updated:: 08/Jul/09 16:43

Resolved:: 28/Oct/08 00:07