Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.3.4
-
Reviewed
Description
Recently,we received an phone alarm about missing blocks. We found logs in one datanode where the block was placed on like below:
2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231832415 src: /clientAddress:44638 dest: /localAddress:50010 of size 45733720 2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231826462 src: /upStreamDatanode:60316 dest: /localAddress:50010 of size 45733720
the datanode received the same block with different generation stamp because of socket timeout exception. blk_1305044966_231826462 is received from upstream datanode in pipeline which has two datanodes. blk_1305044966_231832415 is received from client directly.
we have search all log info about blk_1305044966 in namenode and three datanodes in original pipeline. but we could not obtain any helpful message about the generation stamp 231826462. After diving into the source code, it was assigned in NameNodeRpcServer#updateBlockForPipeline which was invoked in DataStreamer#setupPipelineInternal. The updateBlockForPipeline RPC does not have any log info. So I think we should add some logs in this RPC.
Attachments
Issue Links
- links to