[HADOOP-4702] Failed block replication leaves an incomplete block in receiver's tmp data directory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.17.2
Fix Version/s: 0.18.3
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

When a failure occurs while replicating a block from a source DataNode to a target DataNode, the target node keeps an incomplete on-disk copy of the block in its temp data directory and an in-memory copy of the block in ongoingCreates queue. This causes two problems:
1. Since this block is not (should not) be finalized, NameNode is not aware of the existence of this incomplete block. It may schedule replicating the same block to this node again, which will fail with a message: "Block XX has already been started (though not completed), and thus cannot be created."
2. Restarting the datanode moves the blocks under the temp data directory to be valid blocks, thus introduces corrupted blocks into HDFS. Sometimes those corrupted blocks stay in the system undetected if it happens that the partial block and its checksums match.

A failed block replication should clean up both the in-memory & on-disk copies of the incomplete block.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

tmpBlockRemoval.patch
05/Dec/08 23:14
7 kB
Hairong Kuang
tmpBlockRemoval1.patch
08/Dec/08 19:38
7 kB
Hairong Kuang
tmpBlockRemoval2.patch
08/Dec/08 22:23
8 kB
Hairong Kuang

Issue Links

incorporates

HADOOP-5192 Block reciever should not remove a finalized block when block replication fails

Closed

is related to

HDFS-142 In 0.20, move blocks being written into a blocksBeingWritten directory

Closed

Activity

People

Assignee:: Hairong Kuang

Reporter:: Hairong Kuang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Nov/08 23:14

Updated:: 08/Jul/09 16:43

Resolved:: 09/Dec/08 22:03