Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3669

SCM Infinite loop in BlockManagerImpl.allocateBlock

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.0
    • None
    • SCM
    • None

    Description

      The following step can reproduce this issue

      • A new ozone cluster with only a factor three pipeline
      • put a big file(1G) into cluster, during the put process, we kill the leader datanode of this pipeline.

      The put command will hang, the following log will fill the scm log file.
      2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container allocation failed for pipeline=Pipeline[ Id: bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: e859cad9-c7f6-451a-a039-af06103aa978

      {ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}

      1cd2bf20-a791-42a0-b4cd-b26d995cb8eb

      {ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}

      0827f3bb-0d94-435a-a157-4db2c84cdedf

      {ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}

      , Type:RATIS, Factor:3, State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
      org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
      at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
      at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
      at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
      at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
      at org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
      at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
      at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
      at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
      at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
      at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
      at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
      at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maobaolong Baolong Mao
            maobaolong Baolong Mao

            Dates

              Created:
              Updated:

              Slack

                Issue deployment