Issue Details (XML | Word | Printable)

Key: HADOOP-4702
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Hairong Kuang
Reporter: Hairong Kuang
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Failed block replication leaves an incomplete block in receiver's tmp data directory

Created: 20/Nov/08 11:14 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.17.2
Fix Version/s: 0.18.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works tmpBlockRemoval.patch 2008-12-05 11:14 PM Hairong Kuang 7 kB
Text File Licensed for inclusion in ASF works tmpBlockRemoval1.patch 2008-12-08 07:38 PM Hairong Kuang 7 kB
Text File Licensed for inclusion in ASF works tmpBlockRemoval2.patch 2008-12-08 10:23 PM Hairong Kuang 8 kB
Issue Links:
Incorporates
 
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 09/Dec/08 10:03 PM


 Description  « Hide
When a failure occurs while replicating a block from a source DataNode to a target DataNode, the target node keeps an incomplete on-disk copy of the block in its temp data directory and an in-memory copy of the block in ongoingCreates queue. This causes two problems:
1. Since this block is not (should not) be finalized, NameNode is not aware of the existence of this incomplete block. It may schedule replicating the same block to this node again, which will fail with a message: "Block XX has already been started (though not completed), and thus cannot be created."
2. Restarting the datanode moves the blocks under the temp data directory to be valid blocks, thus introduces corrupted blocks into HDFS. Sometimes those corrupted blocks stay in the system undetected if it happens that the partial block and its checksums match.

A failed block replication should clean up both the in-memory & on-disk copies of the incomplete block.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #724883 Tue Dec 09 20:55:51 UTC 2008 hairong HADOOP-4702. Failed block replication leaves an incomplete block in receiver's tmp data directory. Contributed by Hairong Kuang.
Files Changed
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
MODIFY /hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
MODIFY /hadoop/core/trunk/CHANGES.txt
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java

Repository Revision Date User Message
ASF #724887 Tue Dec 09 20:58:29 UTC 2008 hairong Merge -r 724882:724883 from main to move the change log of HADOOP-4702 into release 0.19.
Files Changed
MODIFY /hadoop/core/branches/branch-0.19/src/test/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
MODIFY /hadoop/core/branches/branch-0.19
MODIFY /hadoop/core/branches/branch-0.19/src/hdfs/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
MODIFY /hadoop/core/branches/branch-0.19/src/hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
MODIFY /hadoop/core/branches/branch-0.19/CHANGES.txt

Repository Revision Date User Message
ASF #724907 Tue Dec 09 21:57:59 UTC 2008 hairong Merge -r 724882:724883 from main to move the change log of HADOOP-4702 into release 0.18.
Files Changed
MODIFY /hadoop/core/branches/branch-0.18/src/hdfs/org/apache/hadoop/dfs/DataNode.java
MODIFY /hadoop/core/branches/branch-0.18/src/test/org/apache/hadoop/dfs/TestDiskError.java
MODIFY /hadoop/core/branches/branch-0.18/src/hdfs/org/apache/hadoop/dfs/FSDataset.java
MODIFY /hadoop/core/branches/branch-0.18/CHANGES.txt