Description
TabletCopyService holds the service-wide lock while initializing each session. Under heavy load, this can cause starvation, timeouts, and failures. Dan came up with a pstack that had a bunch of threads waiting on the same mutex while the holder was doing I/O:
#2 0x0000000001a45d72 in kudu::fs::LogBlockManager::OpenBlock(kudu::BlockId const&, std::unique_ptr<kudu::fs::ReadableBlock, std::default_delete<kudu::fs::ReadableBlock> >*) () #3 0x0000000001a33939 in kudu::FsManager::OpenBlock(kudu::BlockId const&, std::unique_ptr<kudu::fs::ReadableBlock, std::default_delete<kudu::fs::ReadableBlock> >*) () #4 0x00000000008e5edf in kudu::tserver::TabletCopySourceSession::OpenBlockUnlocked(kudu::BlockId const&) () #5 0x00000000008e7b46 in kudu::tserver::TabletCopySourceSession::Init() () #6 0x00000000008e1165 in kudu::tserver::TabletCopyServiceImpl::BeginTabletCopySession(kudu::tserver::BeginTabletCopySessionRequestPB const*, kudu::tserver::BeginTabletCopySessionResponsePB*, kudu::rpc::RpcContext*) ()