Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.10.0, Impala 2.11.0
-
None
-
None
-
ghx-label-6
Description
IMPALA-4623 introduced a locking schema to the file handle cache which has 16 buckets, this results in lock contention between IO threads which limits system throughput.
Most IO threads end-up in one of these stacks.
#0 0x0000000002085d47 in base::internal::SpinLockDelay(int volatile*, int, int) () #1 0x0000000002085c29 in base::SpinLock::SlowLock() () #2 0x00000000010fa76d in impala::io::FileHandleCache<16ul>::GetFileHandle(hdfs_internal* const&, std::string*, long, bool, bool*) () #3 0x00000000010f6e22 in impala::io::DiskIoMgr::GetCachedHdfsFileHandle(hdfs_internal* const&, std::string*, long, impala::io::RequestContext*, bool) () #4 0x00000000010fd514 in impala::io::ScanRange::Open(bool) () #5 0x00000000010f691f in impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, impala::io::ScanRange*) () #6 0x00000000010f6dc4 in impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) () #7 0x0000000000d13333 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) () #8 0x0000000000d13a74 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() () #9 0x000000000128ea3a in thread_proxy () #10 0x00007f49f2bbadc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f49f28e976d in clone () from /lib64/libc.so.6
#0 0x0000000002085d47 in base::internal::SpinLockDelay(int volatile*, int, int) () #1 0x0000000002085c29 in base::SpinLock::SlowLock() () #2 0x00000000010f9929 in impala::io::FileHandleCache<16ul>::ReleaseFileHandle(std::string*, impala::io::HdfsFileHandle*, bool) () #3 0x00000000010fe69e in impala::io::ScanRange::Close() () #4 0x00000000010f6565 in impala::io::DiskIoMgr::HandleReadFinished(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, std::unique_ptr<impala::io::BufferDescriptor, std::default_delete<impala::io::BufferDescriptor> >) () #5 0x00000000010f695b in impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, impala::io::ScanRange*) () #6 0x00000000010f6dc4 in impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) () #7 0x0000000000d13333 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) () #8 0x0000000000d13a74 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() () #9 0x000000000128ea3a in thread_proxy () #10 0x00007f49f2bbadc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f49f28e976d in clone () from /lib64/libc.so.6
Increasing the number of partitions to 256 made the contention go away, a simple fix would be to make the number of partitions a startup flag and change it to 256.
Usually you want a prime number of buckets in a hash table, so e.g., 257 instead of 256.