Description
We found some error log in the datanode. like this
2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 java.io.IOException: Terminating due to a checksum error.java.io.IOException: Unexpected checksum mismatch while writing BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from /192.168.2.101:39495 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744)
While on the source datanode, the log says the block is transmitted.
2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 _9248 (numBytes=16188152) to /192.168.2.103:50010
When the destination datanode gets the checksum mismatch, it reports bad block to NameNode and NameNode marks the replica on the source datanode as corrupt. But actually, the replica on the source datanode is valid. Because the replica can pass the checksum verification.
In all, the replica on the source data is wrongly marked as corrupted.
Attachments
Attachments
Issue Links
- duplicates
-
HDFS-11160 VolumeScanner reports write-in-progress replicas as corrupt incorrectly
- Resolved
- Is contained by
-
HDFS-11056 Concurrent append and read operations lead to checksum error
- Resolved
- is related to
-
HDFS-10587 Incorrect offset/length calculation in pipeline recovery causes block corruption
- Resolved
-
HDFS-11056 Concurrent append and read operations lead to checksum error
- Resolved
- relates to
-
HDFS-6937 Another issue in handling checksum errors in write pipeline
- Resolved
-
HDFS-11160 VolumeScanner reports write-in-progress replicas as corrupt incorrectly
- Resolved