[HDFS-3875] Issue handling checksum errors in write pipeline - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.0.2-alpha
Fix Version/s: 2.1.0-beta, 0.23.8
Component/s: datanode, hdfs-client
Labels:
None

Target Version/s:

2.0.3-alpha
Hadoop Flags:

Reviewed

Description

We saw this issue with one block in a large test cluster. The client is storing the data with replication level 2, and we saw the following:

the second node in the pipeline detects a checksum error on the data it received from the first node. We don't know if the client sent a bad checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
this caused the second node to get kicked out of the pipeline, since it threw an exception. The pipeline started up again with only one replica (the first node in the pipeline)
this replica was later determined to be corrupt by the block scanner, and unrecoverable since it is the only replica

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hdfs-3875-wip.patch
28/Nov/12 22:47
14 kB
Kihwal Lee
hdfs-3875.branch-0.23.no.test.patch.txt
01/Dec/12 09:02
8 kB
Kihwal Lee
hdfs-3875.branch-0.23.with.test.patch.txt
01/Dec/12 09:02
12 kB
Kihwal Lee
hdfs-3875.trunk.no.test.patch.txt
01/Dec/12 09:04
8 kB
Kihwal Lee
hdfs-3875.trunk.with.test.patch.txt
01/Dec/12 09:04
14 kB
Kihwal Lee
hdfs-3875.trunk.with.test.patch.txt
01/Dec/12 18:20
14 kB
Kihwal Lee
hdfs-3875.trunk.no.test.patch.txt
01/Dec/12 18:20
8 kB
Kihwal Lee
hdfs-3875.trunk.patch.txt
06/Dec/12 22:33
15 kB
Kihwal Lee
hdfs-3875.trunk.patch.txt
10/Dec/12 21:20
14 kB
Kihwal Lee
hdfs-3875.branch-0.23.patch.txt
06/Mar/13 21:08
17 kB
Kihwal Lee
hdfs-3875.patch.txt
06/Mar/13 21:08
18 kB
Kihwal Lee
hdfs-3875.branch-0.23.patch.txt
20/May/13 14:06
18 kB
Kihwal Lee
hdfs-3875.patch.txt
20/May/13 14:06
18 kB
Kihwal Lee
hdfs-3875.branch-2.patch.txt
20/May/13 19:11
18 kB
Kihwal Lee
hdfs-3875.patch.txt
20/May/13 19:11
18 kB
Kihwal Lee

Issue Links

duplicates

HDFS-3874 Exception when client reports bad checksum to NN

Resolved

is related to

HDFS-1595 DFSClient may incorrectly detect datanode failure

Resolved

Activity

People

Assignee:: Kihwal Lee

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 30/Aug/12 22:48

Updated:: 03/Sep/14 23:00

Resolved:: 21/May/13 13:49