Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10827

ContainerMetrics.incContainerOpsLatencies() is expensive and blocking

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Later
    • None
    • None
    • None
    • None

    Description

      Using freon dn-echo tool in GRPC mode, ContainerMetrics.incContainerOpsLatencies() accounts for 25% of cpu time. Moreover, it internally uses a synchronized method.

       

      Because of it, DataNode can't process more than 30k requests per second.

       

      We should consider either removing this usage or find an alternative way to calculate operation latency.

       

       

      jstack:

       

      "360b568b-92e2-43ea-83c7-67fb4808e85b-ChunkReader-80" #437 daemon prio=5 os_prio=0 tid=0x00007f01ac64e000 nid=0x2c61db waiting for monitor entry [0x00007f0177eb3000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.metrics2.lib.MutableQuantiles.add(MutableQuantiles.java:133)
              - waiting to lock <0x00000006cb71d188> (a org.apache.hadoop.metrics2.lib.MutableQuantiles)
              at org.apache.hadoop.ozone.container.common.helpers.ContainerMetrics.incContainerOpsLatencies(ContainerMetrics.java:124)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:350)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:194)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher$$Lambda$1117/1215537910.apply(Unknown Source)
              at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:91)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:193)
              at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:112)
              at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:105)
              at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
              at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
              at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
              at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
              at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
              at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
              at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748) 

      Attachments

        1. Screenshot 2024-05-07 at 10.36.18 AM.png
          642 kB
          Wei-Chiu Chuang
        2. dn_dne4.html
          264 kB
          Wei-Chiu Chuang

        Issue Links

          Activity

            People

              Unassigned Unassigned
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: