Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2567 KRPC milestone 1
  3. IMPALA-5528

tcmalloc contention much higher with concurrency after KRPC patch

    XMLWordPrintableJSON

Details

    • ghx-label-4

    Description

      Our testing has revealed that under high concurrency (e.g. the many_independent_fragment_instances primitive), KRPC slows down execution significantly.

      This JIRA is to track the overall issue, and to link to JIRAs for specific spot fixes. This is the result of running perf on a node in a 16-node cluster, running the many_independent_fragment_instances primitive.

      -  13.12%  impalad  impalad              [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
         - tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
            - 93.95% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
               - tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
                  - 98.16% operator new[](unsigned long)
                       29.20% impala::RowDescriptor::RowDescriptor(impala::RowDescriptor const&)
                       16.85% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
                       12.58% impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
                       7.42% kudu::rpc::OutboundTransfer::CreateForCallResponse(std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
                     + 4.34% impala::Codec::CreateDecompressor(impala::MemPool*, bool, impala::THdfsCompression::type, boost::scoped_ptr<impala::Codec>*)
                       4.09% kudu::Trace::Trace()
                       3.79% std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)
                     + 3.59% kudu::rpc::InboundCall::InboundCall(kudu::rpc::Connection*)
                       2.66% void std::vector<impala::MemPool::ChunkInfo, std::allocator<impala::MemPool::ChunkInfo> >::_M_emplace_back_aux<impala::MemPool::ChunkInfo>(impala::MemPool::ChunkInfo&&)
                     + 2.57% kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::DefaultDeleter<kudu::rpc::InboundTransfer> >)
                       2.04% std::vector<kudu::Slice, std::allocator<kudu::Slice> >::reserve(unsigned long)
                       1.92% kudu::rpc::RequestHeader::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
                       1.91% kudu::rpc::RemoteMethodPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
                       1.48% kudu::rpc::Connection::ReadHandler(ev::io&, int)
                       0.87% kudu::HeapBufferAllocator::AllocateInternal(unsigned long, unsigned long, kudu::BufferAllocator*)
                       0.79% kudu::faststring::GrowArray(unsigned long)
                       0.72% kudu::rpc::OutboundTransfer::CreateForCallRequest(int, std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*)
                       0.69% kudu::rpc::Connection::QueueOutboundCall(std::shared_ptr<kudu::rpc::OutboundCall> const&)
                       0.69% kudu::ArenaBase<true>::ArenaBase(unsigned long, unsigned long)
                       0.68% void std::vector<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> >, std::allocator<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> > > >::_M_emplace_back_aux<std::unique_ptr<kudu::A
                       0.57% impala::TransmitDataResponsePb::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
                  + 1.84% tc_malloc
            + 3.03% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
            + 3.02% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)
      -  12.49%  impalad  impalad              [.] SpinLock::SpinLoop()
         - SpinLock::SpinLoop()
            - 98.56% SpinLock::SlowLock()
               - 80.48% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
                  - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
                     - 99.99% tcmalloc::ThreadCache::Scavenge()
                        - operator delete[](void*, std::nothrow_t const&)
                           - 22.51% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
                                impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
                             21.66% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
                             19.52% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
                             15.30% kudu::rpc::InboundCall::~InboundCall()
                             5.69% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
                             3.97% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, st
                             2.44% kudu::rpc::RpcContext::~RpcContext()
                             2.20% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
                             1.91% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<
                             1.05% kudu::Trace::~Trace()
                             0.50% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
               + 9.38% tcmalloc::ThreadCache::IncreaseCacheLimit()
               + 7.43% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
               + 1.50% tcmalloc::CentralFreeList::Populate()
               + 1.19% tcmalloc::CentralFreeList::ReleaseToSpans(void*)
            + 1.13% tcmalloc::CentralFreeList::InsertRange(void*, void*, int)
      -   8.95%  impalad  impalad              [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
         - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)
            - 99.71% tcmalloc::ThreadCache::Scavenge()
               - operator delete[](void*, std::nothrow_t const&)
                    27.47% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >)
                  - 22.12% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*)
                       impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&)
                    20.73% impala::TransmitDataResponsePb::~TransmitDataResponsePb()
                    9.98% kudu::rpc::InboundCall::~InboundCall()
                    6.32% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*)
                    4.20% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<u
                    2.03% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int)
                    1.88% kudu::rpc::RpcContext::~RpcContext()
                    1.00% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned
                    0.71% kudu::rpc::OutboundCall::~OutboundCall()
                    0.65% kudu::Trace::~Trace()
                    0.64% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse()
      +   7.90%  impalad  impalad              [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
      

      Attachments

        Issue Links

          Activity

            People

              mmokhtar Mostafa Mokhtar
              henryr Henry Robinson
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: