Description
We are seeing cases on HDFS blocks being marked as bad after the initial block receive fails during update pipeline followed by HDFS recovery for the block failing as well. Here is the life cycle of a block blk_1342440165_272630578 that was ultimately marked as corrupt:
1. The block creation starts at name node as part of update pipeline process:
2021-01-25 03:41:17,335 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem (IPC Server handler 61 on 8020): updatePipeline(blk_1342440165_272500939 => blk_1342440165_272630578) success
2. The block receiver on the data node fails with a socket timeout exception, and so do the retries:
2021-01-25 03:42:22,525 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010] java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.21.226.26:56294 remote=/172.21.246.239:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:400) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1305) at java.lang.Thread.run(Thread.java:748) 2021-01-25 03:42:22,526 WARN org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): IOException in BlockReceiver.run(): java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1552) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1489) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1402) at java.lang.Thread.run(Thread.java:748) 2021-01-25 03:42:22,526 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1552) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1489) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1402) at java.lang.Thread.run(Thread.java:748)
3. The data node terminates the receiving of block and initiates recovery:
2021-01-25 03:42:22,526 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010]): PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.21.246.239:50010] terminating 2021-01-25 03:42:22,529 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:60430 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578]): Receiving BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578 src: /172.21.240.31:60430 dest: /172.21.226.26:50010 2021-01-25 03:42:22,529 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:60430 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578]): Recover RBW replica BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578
4. The recovery on the datanode for blk_1342440165_272630578 fails with:
2021-01-25 03:42:34,223 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:45918 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272500939]): Exception for BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578 java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1716) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:1052) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:733) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:923) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:854) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:288) at java.lang.Thread.run(Thread.java:748) 2021-01-25 03:42:34,223 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:45918 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272500939]): opWriteBlock BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578 received exception java.nio.channels.ClosedByInterruptException 2021-01-25 03:42:34,223 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:45918 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272500939]): ip-172-21-226-26.ec2.internal:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.21.240.31:45918 dst: /172.21.226.26:50010 java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1716) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:1052) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:733) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:923) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:854) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:288) at java.lang.Thread.run(Thread.java:748)
5. This is followed by the resetting of bytes on the data node to 3072 the correct length, which is an expected behavior based on https://issues.apache.org/jira/browse/hdfs-11472:
2021-01-25 03:42:40,007 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:60430 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578]): At ip-172-21-226-26.ec2.internal:50010, Recovering ReplicaBeingWritten, blk_1342440165_272630578, RBW getNumBytes() = 3072 getBytesOnDisk() = 2678 getVisibleLength()= 3072 getVolume() = /mnt/hdfs/current getBlockFile() = /mnt/hdfs/current/BP-908477295-172.21.224.178-1606768078949/current/rbw/blk_1342440165 bytesAcked=3072 bytesOnDisk=2678 2021-01-25 03:42:40,011 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (DataXceiver for client DFSClient_attempt_16089610225137_1287341_r_000139_0_392781198_30 at /172.21.244.117:34354 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342762434_272648220]): Receiving BP-908477295-172.21.224.178-1606768078949:blk_1342762434_272648220 src: /172.21.244.117:34354 dest: /172.21.226.26:50010 2021-01-25 03:42:40,013 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl (DataXceiver for client DFSClient_NONMAPREDUCE_390070548_42 at /172.21.240.31:60430 [Receiving block BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272630578]): Resetting bytesOnDisk to match blockDataLength (=3072) for replica ReplicaBeingWritten, blk_1342440165_272630578, RBW getNumBytes() = 3072 getBytesOnDisk() = 2678 getVisibleLength()= 3072 getVolume() = /mnt/hdfs/current getBlockFile() = /mnt/hdfs/current/BP-908477295-172.21.224.178-1606768078949/current/rbw/blk_1342440165 bytesAcked=3072 bytesOnDisk=2678
6. Since recovery failed, the Name Node restarts the updatePipeline for the block by increasing the generation:
2021-01-25 03:42:40,034 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem (IPC Server handler 15 on 8020): updatePipeline(blk_1342440165_272630578, newGS=272645069, newLength=3072, newNodes=[172.21.226.26:50010], client=DFSClient_NONMAPREDUCE_390070548_42) 2021-01-25 03:42:40,034 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem (IPC Server handler 15 on 8020): updatePipeline(blk_1342440165_272630578 => blk_1342440165_272645069) success
Notice how the correct newLength in sent in the above log from Name Node i.e. 3072.
7. The data node then receives the new generation block of this block correctly:
2021-01-25 03:46:40,066 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace (PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069, type=LAST_IN_PIPELINE): src: /172.21.240.31:60430, dest: /172.21.226.26:50010, bytes: 3554, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_390070548_42, offset: 0, srvID: 1a2ff556-73f5-4e10-b018-a0c2105b242e, blockid: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069, duration: 240052941459 2021-01-25 03:46:40,066 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069, type=LAST_IN_PIPELINE): PacketResponder: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069, type=LAST_IN_PIPELINE terminating
But here notice how bytes: 3554 which is different from what was sent from the Name Node i.e. 3072.
8. The data node then does the successful downstream replication into another node for this block:
2021-01-25 03:46:43,093 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (BP-908477295-172.21.224.178-1606768078949 heartbeating to /172.21.238.225:8020): DatanodeRegistration(172.21.226.26:50010, datanodeUuid=1a2ff556-73f5-4e10-b018-a0c2105b242e, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=CID-7608840c-3634-4377-9ba0-1c710e6d08d5;nsid=2011779031;c=1606768078949) Starting thread to transfer BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069 to 172.21.234.181:50010 2021-01-25 03:46:43,094 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@5cb29d58): DataTransfer, at ip-172-21-226-26.ec2.internal:50010: Transmitted BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069 (numBytes=3554) to /172.21.234.181:5001
9. However, immediately after successful transmitting and replication the namenode declares the block as corrupt:
2021-01-25 03:46:43,106 INFO org.apache.hadoop.hdfs.StateChange (IPC Server handler 46 on 8020): *DIR* reportBadBlocks for block: BP-908477295-172.21.224.178-1606768078949:blk_1342440165_272645069 on datanode: 172.21.226.26:50010
Any suggestions on what could cause this difference in the bytes communicated by the name node (3072) vs whats written on the the data node (3554) ?