Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2567 KRPC milestone 1
  3. IMPALA-6356

Excessive synchronous logging in RpczStore::LogTrace causes severe slowdown for exchange operators spanning 2-3 minutes

    Details

    • Epic Color:
      ghx-label-7

      Description

      This applies when KRPC is enabled.

      When RPCs are take longer than expected time for various reasons RpczStore starts logging the slow RPCs, the current log4j used in Impala is synchronous, which means every time a thread needs to write to the log a lock has to be acquired.

      This logging contributes to additional slowdown to what is already happening.

      Sample stack for blocked threads

      #0  0x000000326c60ad20 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
      #1  0x00000000020b0d94 in google::LogMessage::Flush() ()
      #2  0x00000000020b0f99 in google::LogMessage::~LogMessage() ()
      #3  0x000000000118766d in kudu::rpc::RpczStore::LogTrace(kudu::rpc::InboundCall*) ()
      #4  0x0000000001189611 in kudu::rpc::RpczStore::AddCall(kudu::rpc::InboundCall*) ()
      #5  0x000000000118599c in kudu::rpc::InboundCall::Respond(google::protobuf::MessageLite const&, bool) ()
      #6  0x0000000001185c44 in kudu::rpc::InboundCall::RespondSuccess(google::protobuf::MessageLite const&) ()
      #7  0x000000000117f776 in kudu::rpc::RpcContext::RespondSuccess(google::protobuf::Message const&) ()
      #8  0x0000000000ba4a08 in void impala::DataStreamService::Reply<impala::TransmitDataResponsePB>(impala::Status const&, kudu::rpc::RpcContext*) ()
      #9  0x0000000000b18747 in impala::KrpcDataStreamRecvr::SenderQueue::AddBatch(impala::TransmitDataRequestPB const*, kudu::rpc::RpcContext*) ()
      #10 0x0000000000b0e750 in impala::KrpcDataStreamMgr::AddData(impala::TransmitDataRequestPB const*, kudu::rpc::RpcContext*) ()
      

      Stack for thread which acquires the lock

      Thread 1616 (Thread 0x7fdb5a1cd700 (LWP 134451)):
      #0  0x000000326c2df38d in posix_fadvise64 () from /lib64/libc.so.6
      #1  0x00000000020b7c24 in google::(anonymous namespace)::LogFileObject::Write(bool, long, char const*, int) ()
      #2  0x00000000020b2bac in google::LogMessage::SendToLog() ()
      #3  0x00000000020b0c37 in google::LogMessage::Flush() ()
      #4  0x00000000020b0f99 in google::LogMessage::~LogMessage() ()
      #5  0x000000000118766d in kudu::rpc::RpczStore::LogTrace(kudu::rpc::InboundCall*) ()
      #6  0x0000000001189611 in kudu::rpc::RpczStore::AddCall(kudu::rpc::InboundCall*) ()
      #7  0x000000000118599c in kudu::rpc::InboundCall::Respond(google::protobuf::MessageLite const&, bool) ()
      #8  0x0000000001185c44 in kudu::rpc::InboundCall::RespondSuccess(google::protobuf::MessageLite const&) ()
      #9  0x000000000117f776 in kudu::rpc::RpcContext::RespondSuccess(google::protobuf::Message const&) ()
      #10 0x0000000000ba4a08 in void impala::DataStreamService::Reply<impala::TransmitDataResponsePB>(impala::Status const&, kudu::rpc::RpcContext*) ()
      #11 0x0000000000b18747 in impala::KrpcDataStreamRecvr::SenderQueue::AddBatch(impala::TransmitDataRequestPB const*, kudu::rpc::RpcContext*) ()
      #12 0x0000000000b0e750 in impala::KrpcDataStreamMgr::AddData(impala::TransmitDataRequestPB const*, kudu::rpc::RpcContext*) ()
      

      From the impalad log

      2298) took 3451ms. Request Metrics: {}
      I1228 16:13:14.730983 134370 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.38:60573 (request call id 3043448) took 3458ms. Request Metrics: {}
      I1228 16:13:14.734107 134525 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.19:51812 (request call id 3087015) took 3460ms. Request Metrics: {}
      I1228 16:13:14.738078 134138 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.26:35325 (request call id 3066300) took 3462ms. Request Metrics: {}
      I1228 16:13:14.742720 134139 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.37:57936 (request call id 3053778) took 3467ms. Request Metrics: {}
      I1228 16:13:14.743108 134437 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.30:34257 (request call id 3061639) took 3467ms. Request Metrics: {}
      I1228 16:13:14.743425 134459 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.27:49081 (request call id 3089754) took 3467ms. Request Metrics: {}
      I1228 16:13:14.743743 134390 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.19:51812 (request call id 3087016) took 3467ms. Request Metrics: {}
      I1228 16:13:14.744191 134200 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.29:57825 (request call id 3054627) took 3467ms. Request Metrics: {}
      I1228 16:13:14.744521 134430 rpcz_store.cc:248] Call impala.DataStreamService.TransmitData from 10.17.221.11:34974 (request call id 3214166) took 3466ms. Request Metrics: {}
      

      An ideal solution would be to upgrade log4j and use asynchronous logging or create print a histogram for all RPCs within a time window.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kwho Michael Ho
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: