Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.18.1
-
None
-
None
-
CentOS 5.2, JDK 1.6,
16 Datanodes and 1 Namenodes, each has 8GB Memory and a 4-core CPU, connected by GigabyteEthernet
-
Reviewed
Description
We recently deployed a 0.18.1 cluster and did some test. And we found
if we corrupt a block, the namenode will find it and replicate it as soon as
a client read that block. However, the namenode will delete a health block
(the source of the above replication operation) at the same time, (I think this
issue may affect all 0.18 tree.)
Having did some trace, I find in FSNamesystem.addStoredBlock(), it will
check the number of replications after add the block to blocksMap:
NumberReplicas num = countNodes(storedBlock); |
int numLiveReplicas = num.liveReplicas(); |
int numCurrentReplica = numLiveReplicas |
+ pendingReplications.getNumReplicas(block); |
which means all the live replicas and pending replications will be
counted. But in the end of FSNamesystem.blockReceived(), which
calls the addStoredBlock(), it will call addStoredBlock() first, then
reduce the pendingReplications count.
// |
// Modify the blocks->datanode map and node's map. |
// |
addStoredBlock(block, node, delHintNode ); |
pendingReplications.remove(block); |
Hence, the newly replicated replica will be counted twice, and then
will be marked as excess and lead to a mistake deletion.
I think change the counting lines in blockReceived(), may solve this
issue:
— FSNamesystem.java-orig 2008-11-28 13:34:40.000000000 +0800
+++ FSNamesystem.java 2008-11-28 13:54:12.000000000 +0800
@@ -3152,8 +3152,8 @@
//
// Modify the blocks->datanode map and node's map.
//
- addStoredBlock(block, node, delHintNode );
pendingReplications.remove(block);
+ addStoredBlock(block, node, delHintNode );
}
long[] getStats() throws IOException {
The following is the logs for the mistake deletion, with additional
logging info inserted by me.
2008-11-28 11:22:08,866 INFO org.apache.hadoop.dfs.StateChange: DIR
NameNode.reportBadBlocks
2008-11-28 11:22:08,866 INFO org.apache.hadoop.dfs.StateChange: BLOCK
NameSystem.addToCorruptReplicasMap: blk_3828935579548953768 added as
corrupt on 192.168.33.51:50010 by /192.168.33.51
2008-11-28 11:22:10,179 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
ask 192.168.33.50:50010 to replicate blk_3828935579548953768_1184 to
datanode(s) 192.168.33.45:50010
2008-11-28 11:22:12,629 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.33.45:50010 is
added to blk_3828935579548953768_1184 size 67108864
2008-11-28 11:22:12,629 INFO org.apache.hadoop.dfs.StateChange: Wang
Xu* NameSystem.addStoredBlock: current replicas 4 in which has 1
pendings
2008-11-28 11:22:12,630 INFO org.apache.hadoop.dfs.StateChange: DIR*
NameSystem.invalidateBlock: blk_3828935579548953768_1184 on
192.168.33.51:50010
2008-11-28 11:22:12,630 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.delete: blk_3828935579548953768 is added to invalidSet of
192.168.33.51:50010
2008-11-28 11:22:13,180 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
ask 192.168.33.44:50010 to delete blk_3828935579548953768_1184
2008-11-28 11:22:13,181 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
ask 192.168.33.51:50010 to delete blk_3828935579548953768_1184