Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.10.0
-
None
-
ghx-label-4
Description
Our testing has revealed that under high concurrency (e.g. the many_independent_fragment_instances primitive), KRPC slows down execution significantly.
This JIRA is to track the overall issue, and to link to JIRAs for specific spot fixes. This is the result of running perf on a node in a 16-node cluster, running the many_independent_fragment_instances primitive.
- 13.12% impalad impalad [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) - tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) - 93.95% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) - tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) - 98.16% operator new[](unsigned long) 29.20% impala::RowDescriptor::RowDescriptor(impala::RowDescriptor const&) 16.85% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >) 12.58% impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&) 7.42% kudu::rpc::OutboundTransfer::CreateForCallResponse(std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*) + 4.34% impala::Codec::CreateDecompressor(impala::MemPool*, bool, impala::THdfsCompression::type, boost::scoped_ptr<impala::Codec>*) 4.09% kudu::Trace::Trace() 3.79% std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) + 3.59% kudu::rpc::InboundCall::InboundCall(kudu::rpc::Connection*) 2.66% void std::vector<impala::MemPool::ChunkInfo, std::allocator<impala::MemPool::ChunkInfo> >::_M_emplace_back_aux<impala::MemPool::ChunkInfo>(impala::MemPool::ChunkInfo&&) + 2.57% kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::DefaultDeleter<kudu::rpc::InboundTransfer> >) 2.04% std::vector<kudu::Slice, std::allocator<kudu::Slice> >::reserve(unsigned long) 1.92% kudu::rpc::RequestHeader::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*) 1.91% kudu::rpc::RemoteMethodPB::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*) 1.48% kudu::rpc::Connection::ReadHandler(ev::io&, int) 0.87% kudu::HeapBufferAllocator::AllocateInternal(unsigned long, unsigned long, kudu::BufferAllocator*) 0.79% kudu::faststring::GrowArray(unsigned long) 0.72% kudu::rpc::OutboundTransfer::CreateForCallRequest(int, std::vector<kudu::Slice, std::allocator<kudu::Slice> > const&, kudu::rpc::TransferCallbacks*) 0.69% kudu::rpc::Connection::QueueOutboundCall(std::shared_ptr<kudu::rpc::OutboundCall> const&) 0.69% kudu::ArenaBase<true>::ArenaBase(unsigned long, unsigned long) 0.68% void std::vector<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> >, std::allocator<std::unique_ptr<kudu::ArenaBase<true>::Component, std::default_delete<kudu::ArenaBase<true>::Component> > > >::_M_emplace_back_aux<std::unique_ptr<kudu::A 0.57% impala::TransmitDataResponsePb::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*) + 1.84% tc_malloc + 3.03% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) + 3.02% tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) - 12.49% impalad impalad [.] SpinLock::SpinLoop() - SpinLock::SpinLoop() - 98.56% SpinLock::SlowLock() - 80.48% tcmalloc::CentralFreeList::InsertRange(void*, void*, int) - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) - 99.99% tcmalloc::ThreadCache::Scavenge() - operator delete[](void*, std::nothrow_t const&) - 22.51% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*) impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&) 21.66% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >) 19.52% impala::TransmitDataResponsePb::~TransmitDataResponsePb() 15.30% kudu::rpc::InboundCall::~InboundCall() 5.69% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*) 3.97% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, st 2.44% kudu::rpc::RpcContext::~RpcContext() 2.20% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int) 1.91% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map< 1.05% kudu::Trace::~Trace() 0.50% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse() + 9.38% tcmalloc::ThreadCache::IncreaseCacheLimit() + 7.43% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) + 1.50% tcmalloc::CentralFreeList::Populate() + 1.19% tcmalloc::CentralFreeList::ReleaseToSpans(void*) + 1.13% tcmalloc::CentralFreeList::InsertRange(void*, void*, int) - 8.95% impalad impalad [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) - tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) - 99.71% tcmalloc::ThreadCache::Scavenge() - operator delete[](void*, std::nothrow_t const&) 27.47% kudu::rpc::Connection::QueueResponseForCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter<kudu::rpc::InboundCall> >) - 22.12% impala::RowBatch::RowBatch(impala::RowDescriptor const&, impala::InboundProtoRowBatch const&, impala::MemTracker*) impala::DataStreamRecvr::SenderQueue::AddBatch(std::unique_ptr<impala::TransmitDataCtx, std::default_delete<impala::TransmitDataCtx> >&&) 20.73% impala::TransmitDataResponsePb::~TransmitDataResponsePb() 9.98% kudu::rpc::InboundCall::~InboundCall() 6.32% kudu::rpc::QueueTransferTask::Run(kudu::rpc::ReactorThread*) 4.20% std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::InboundCall*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<u 2.03% kudu::rpc::ReactorThread::AsyncHandler(ev::async&, int) 1.88% kudu::rpc::RpcContext::~RpcContext() 1.00% std::unordered_map<unsigned long, kudu::rpc::Connection::CallAwaitingResponse*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, kudu::rpc::Connection::CallAwaitingResponse*> > >::mapped_type EraseKeyReturnValuePtr<std::unordered_map<unsigned 0.71% kudu::rpc::OutboundCall::~OutboundCall() 0.65% kudu::Trace::~Trace() 0.64% kudu::rpc::Connection::CallAwaitingResponse::~CallAwaitingResponse() + 7.90% impalad impalad [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
Attachments
Issue Links
- breaks
-
IMPALA-6414 Impalad binary failed to start with SIGSEGV with GPerfTools 2.6.3 on certain platforms
- Resolved
- is blocked by
-
IMPALA-5518 Allocate KrpcDataStreamRecvr RowBatch tuples from BufferPool
- Resolved
-
IMPALA-5481 RowDescriptors should be shared, rather than copied
- Resolved
- relates to
-
KUDU-1865 Create fast path for RespondSuccess() in KRPC
- Open