Description
The following step can reproduce this issue
- A new ozone cluster with only a factor three pipeline
- put a big file(1G) into cluster, during the put process, we kill the leader datanode of this pipeline.
The put command will hang, the following log will fill the scm log file.
2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container allocation failed for pipeline=Pipeline[ Id: bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: e859cad9-c7f6-451a-a039-af06103aa978
1cd2bf20-a791-42a0-b4cd-b26d995cb8eb
{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}0827f3bb-0d94-435a-a157-4db2c84cdedf
{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)