Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2567 KRPC milestone 1
  3. IMPALA-5528

tcmalloc contention much higher with concurrency after KRPC patch

    Details

    • Epic Color:
      ghx-label-4

      Description

      Our testing has revealed that under high concurrency (e.g. the many_independent_fragment_instances primitive), KRPC slows down execution significantly.

      This JIRA is to track the overall issue, and to link to JIRAs for specific spot fixes. This is the result of running perf on a node in a 16-node cluster, running the many_independent_fragment_instances primitive.

      -  13.12%  impalad  impalad              [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
         - tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
            - 93.95% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
               - tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
                  - 98.16% operator new[](unsigned long)
                       29.20% impala::RowDescriptor::RowDescriptor(impala::RowDescriptor const&)
                       16.85% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
                       12.58% impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
                       7.42% kudu::rpc::OutboundTransfer::CreateForCallResponse(std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
                     + 4.34% impala::Codec::CreateDecompressor(impala::MemPool*, bool, impala::THdfsCompression::type, boost::scoped_ptr<impala::Codec>*)
                       4.09% kudu::Trace::Trace()
                       3.79% std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)
                     + 3.59% kudu::rpc::InboundCall::InboundCall(kudu::rpc::Connection*)
                       2.66% void std::vector<impala::MemPool::ChunkInfo, std::allocator<impala::MemPool::ChunkInfo> >::_M_emplace_back_aux<impala::MemPool::ChunkInfo>(impala::MemPool::ChunkInfo&&)
                     + 2.57% kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::DefaultDeleter<kudu::rpc::InboundTransfer> >)
                       2.04% std::vector<kudu::Slice, std::allocator<kudu::Slice> >::reserve(unsigned long)
                       1.92% kudu::rpc::RequestHeader::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
                       1.91% kudu::rpc::RemoteMethodPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
                       1.48% kudu::rpc::Connection::ReadHandler(ev::io&, int)
                       0.87% kudu::HeapBufferAllocator::AllocateInternal(unsigned long, unsigned long, kudu::BufferAllocator*)
                       0.79% kudu::faststring::GrowArray(unsigned long)
                       0.72% kudu::rpc::OutboundTransfer::CreateForCallRequest(int, std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
                       0.69% kudu::rpc::Connection::QueueOutboundCall(std::shared_ptr<kudu::rpc::OutboundCall> const&)
                       0.69% kudu::ArenaBase<true>::ArenaBase(unsigned long, unsigned long)
                       0.68% void std::vector<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> >, std::allocator<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> > > >::_M_emplace_back_aux<std::unique_ptr<kudu::A
                       0.57% impala::TransmitDataResponsePb::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
                  + 1.84% tc_malloc
            + 3.03% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
            + 3.02% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
      -  12.49%  impalad  impalad              [.] SpinLock::SpinLoop()
         - SpinLock::SpinLoop()
            - 98.56% SpinLock::SlowLock()
               - 80.48% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
                  - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
                     - 99.99% tcmalloc::ThreadCache::Scavenge()
                        - operator delete[](void*, std::nothrow_t const&)
                           - 22.51% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
                                impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
                             21.66% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
                             19.52% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
                             15.30% kudu::rpc::InboundCall::~InboundCall()
                             5.69% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
                             3.97% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, st
                             2.44% kudu::rpc::RpcContext::~RpcContext()
                             2.20% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
                             1.91% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<
                             1.05% kudu::Trace::~Trace()
                             0.50% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
               + 9.38% tcmalloc::ThreadCache::IncreaseCacheLimit()
               + 7.43% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
               + 1.50% tcmalloc::CentralFreeList::Populate()
               + 1.19% tcmalloc::CentralFreeList::ReleaseToSpans(void*)
            + 1.13% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
      -   8.95%  impalad  impalad              [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
         - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
            - 99.71% tcmalloc::ThreadCache::Scavenge()
               - operator delete[](void*, std::nothrow_t const&)
                    27.47% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
                  - 22.12% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
                       impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
                    20.73% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
                    9.98% kudu::rpc::InboundCall::~InboundCall()
                    6.32% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
                    4.20% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<u
                    2.03% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
                    1.88% kudu::rpc::RpcContext::~RpcContext()
                    1.00% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned
                    0.71% kudu::rpc::OutboundCall::~OutboundCall()
                    0.65% kudu::Trace::~Trace()
                    0.64% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
      +   7.90%  impalad  impalad              [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mmokhtar Mostafa Mokhtar
                Reporter:
                henryr Henry Robinson
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: