Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5785

containerCreationLock is a severe bottleneck when there are many containers to create

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • None

    Description

      Here is the trace log of the tool Arthas. Most of the execution time is contributed by lock aquire. Please be noted that these logs are captured from different call stacks, the data only shows the trend.

      `---ts=2021-09-24 21:41:05;thread_name=201a016a-2966-40b3-b289-12413f6197b4@group-1998503DB7A6-SegmentedRaftLogWorker;id=157;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@6659c656
      `---[3193.757119ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog:execute()
      +---[0.00279ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$800() #505
      +---[0.001601ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StateMachineDataPolicy:isSync() #505
      +---[0.002194ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$800() #506
      +---[3193.626701ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StateMachineDataPolicy:getFromFuture() #506
      +---[0.003975ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$200() #509
      +---[0.004119ms] org.apache.ratis.server.metrics.SegmentedRaftLogMetrics:onRaftLogAppendEntry() #509
      +---[0.002571ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$900() #510
      +---[0.001707ms] org.apache.ratis.util.Preconditions:assertTrue() #510
      +---[0.002315ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$1000() #511
      +---[0.002079ms] org.apache.ratis.proto.RaftProtos$LogEntryProto:getIndex() #511
      +---[0.001338ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$1000() #512
      +---[0.00187ms] org.apache.ratis.util.Preconditions:assertTrue() #511
      +---[0.003476ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$900() #513
      +---[0.014489ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream:write() #513
      +---[0.001518ms] org.apache.ratis.proto.RaftProtos$LogEntryProto:getIndex() #514
      +---[0.001904ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$1002() #514
      +---[0.001927ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$1108() #515
      `---[0.002058ms] org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:access$1200() #516

      `---ts=2021-09-26 10:47:31;thread_name=ChunkWriter-80-0;id=d7;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@6659c656
      `---[61594.581776ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:dispatchRequest()
      +---[0.001458ms] com.google.common.base.Preconditions:checkNotNull() #180
      +---[8.67E-4ms] org.slf4j.Logger:isTraceEnabled() #181
      +---[9.84E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getCmdType() #187
      +---[0.001111ms] org.apache.hadoop.ozone.container.common.helpers.ContainerCommandRequestPBHelper:getAuditAction() #186
      +---[0.001595ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getEventType() #188
      +---[0.002115ms] org.apache.hadoop.ozone.container.common.helpers.ContainerCommandRequestPBHelper:getAuditParams() #190
      +---[8.22E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getCmdType() #195
      +---[9.09E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getContainerID() #196
      +---[0.002376ms] org.apache.hadoop.ozone.container.common.helpers.ContainerMetrics:incContainerOpsMetrics() #197
      +---[0.003211ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getContainer() #198
      +---[0.001122ms] org.apache.hadoop.ozone.container.common.transport.server.ratis.DispatcherContext:getStage() #201
      +---[8.56E-4ms] org.apache.hadoop.ozone.container.common.transport.server.ratis.DispatcherContext:getStage() #205
      +---[0.010324ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:validateToken() #209
      +---[8.47E-4ms] org.apache.hadoop.ozone.container.common.transport.server.ratis.DispatcherContext:getStage() #220
      +---[8.62E-4ms] org.apache.hadoop.ozone.container.common.transport.server.ratis.DispatcherContext:getContainer2BCSIDMap() #224
      +---[0.00112ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getMissingContainerSet() #240
      +---[61593.306048ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:createContainer() #258
      +---[0.001499ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandResponseProto:getResult() #259
      +---[0.001059ms] com.google.common.base.Preconditions:checkArgument() #266
      +---[0.002173ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getContainer() #273
      +---[0.003009ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getContainerType() #284
      +---[0.001432ms] org.apache.hadoop.hdds.HddsUtils:isReadOnly() #295
      +---[0.002478ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:sendCloseContainerActionIfNeeded() #296
      +---[0.001382ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getHandler() #298
      +---[1.169763ms] org.apache.hadoop.ozone.container.common.interfaces.Handler:handle() #307
      +---[0.001698ms] org.apache.hadoop.ozone.container.common.helpers.ContainerMetrics:incContainerOpsLatencies() #309
      +---[0.00119ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandResponseProto:getResult() #323
      +---[0.001039ms] org.apache.hadoop.hdds.HddsUtils:isReadOnly() #329
      +---[0.001083ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:canIgnoreException() #329
      +---[0.001669ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:updateBCSID() #364
      `---[0.001459ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:audit() #365

      `---ts=2021-09-26 10:57:09;thread_name=ChunkWriter-63-0;id=c6;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@6659c656
      `---[40513.739685ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:createContainer()
      +---[0.001661ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$CreateContainerRequestProto:newBuilder() #406
      +---[0.001753ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$CreateContainerRequestProto$Builder:setContainerType() #409
      +---[0.002695ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:newBuilder() #412
      +---[0.002142ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:setCmdType() #413
      +---[0.001441ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getContainerID() #414
      +---[0.001299ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:setContainerID() #414
      +---[0.003278ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$CreateContainerRequestProto$Builder:build() #415
      +---[0.001737ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:setCreateContainer() #415
      +---[0.001148ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getPipelineID() #416
      +---[0.001226ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:setPipelineID() #416
      +---[0.002452ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getDatanodeUuid() #417
      +---[0.001146ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:setDatanodeUuid() #417
      +---[0.001105ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getTraceID() #418
      +---[0.001183ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:setTraceID() #418
      +---[0.00189ms] org.apache.hadoop.ozone.container.common.impl.HddsDispatcher:getHandler() #422
      +---[0.001769ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto$Builder:build() #423
      `---[40513.674318ms] org.apache.hadoop.ozone.container.common.interfaces.Handler:handle() #423

      `---ts=2021-09-26 11:00:18;thread_name=ChunkWriter-10-0;id=91;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@6659c656
      `---[52072.335521ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:handle()
      `---[52072.328455ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:dispatchRequest() #170

      `---ts=2021-09-26 11:02:09;thread_name=ChunkWriter-60-0;id=c3;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@6659c656
      `---[51435.670861ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:dispatchRequest()
      +---[8.21E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getCmdType() #178
      +---[4.32E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$Type:ordinal() #180
      `---[51435.663218ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:handleCreateContainer() #182

      `---ts=2021-09-26 11:03:50;thread_name=ChunkWriter-33-0;id=a8;is_daemon=true;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@6659c656
      `---[13057.606294ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:handleCreateContainer()
      +---[0.001429ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:hasCreateContainer() #239
      +---[0.001533ms] com.google.common.base.Preconditions:checkArgument() #248
      +---[8.95E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getContainerID() #250
      +---[0.002883ms] org.apache.hadoop.ozone.container.common.impl.ChunkLayOutVersion:getConfiguredVersion() #253
      +---[8.89E-4ms] org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$ContainerCommandRequestProto:getPipelineID() #255
      +---[0.001117ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:getDatanodeId() #256
      +---[0.002143ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData:<init>() #256
      +---[0.002526ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer:<init>() #259
      +---[12661.234468ms] org.apache.hadoop.util.AutoCloseableLock:acquire() #263
      +---[0.004718ms] org.apache.hadoop.ozone.container.common.impl.ContainerSet:getContainer() #264
      +---[396.290691ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer:create() #265
      +---[0.006224ms] org.apache.hadoop.ozone.container.common.impl.ContainerSet:addContainer() #266
      +---[0.006072ms] org.apache.hadoop.util.AutoCloseableLock:close() #273
      +---[0.009352ms] org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:sendICR() #279
      `---[0.002687ms] org.apache.hadoop.hdds.scm.protocolPB.ContainerCommandResponseBuilders:getSuccessResponse() #284

      Attachments

        Issue Links

          Activity

            People

              Sammi Sammi Chen
              Sammi Sammi Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: