Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      buffer-pool-test hung during execution.

      This is the commit where the hang happened:

      https://github.com/apache/incubator-impala/commit/894bb7785519f4f00d1af52e9c6602c43396503c

      The INFO logs are attached.

        Activity

        Hide
        jbapple Jim Apple added a comment -

        DOwngraded since likely a test problem, not a BE problem

        Show
        jbapple Jim Apple added a comment - DOwngraded since likely a test problem, not a BE problem
        Hide
        tarmstrong Tim Armstrong added a comment -

        The check failures are red herrings - they're all expected failures wrapped in DEBUG_DEATH tests. The bug is that it hangs.

        Summarised stacks are:

             47 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,impala::DiskIoMgr::GetNextRequestRange,impala::DiskIoMgr::WorkLoop,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
             18 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,pthread_mutex_lock,at,boost::unique_lock<boost::mutex>::lock,boost::unique_lock<boost::mutex>::unique_lock,impala::BlockingQueue<boost::function<void()>,impala::ThreadPool<boost::function<void()>,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<boost::function<void()>,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
             15 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,pthread_mutex_lock,at,boost::unique_lock<boost::mutex>::lock,boost::unique_lock<boost::mutex>::unique_lock,impala::BlockingQueue<impala::HdfsOp>::BlockingGet,impala::ThreadPool<impala::HdfsOp>::WorkerThread,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::HdfsOp>*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              8 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),Monitor::IWait(Thread*,,Monitor::wait(bool,,GCTaskManager::get_task(unsigned,GCTaskThread::run(),java_start(Thread*),start_thread,clone
              2 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),Monitor::IWait(Thread*,,Monitor::wait(bool,,CompileQueue::get(),CompileBroker::compiler_thread_loop(),JavaThread::thread_main_inner(),JavaThread::run(),java_start(Thread*),start_thread,clone
              2 pthread_cond_wait@@GLIBC_2.3.2,impala::ConditionVariable::Wait,impala::BlockingQueue<boost::function<void()>,impala::ThreadPool<boost::function<void()>,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<boost::function<void()>,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 sem_wait,check_pending_signals(bool),signal_thread_entry(JavaThread*,,JavaThread::thread_main_inner(),JavaThread::run(),java_start(Thread*),start_thread,clone
              1 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),ObjectMonitor::wait(long,,JVM_MonitorWait,??,??,??,??,??,??
              1 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),ObjectMonitor::wait(long,,JVM_MonitorWait,??,??,??
              1 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),Monitor::IWait(Thread*,,Monitor::wait(bool,,ServiceThread::service_thread_entry(JavaThread*,,JavaThread::thread_main_inner(),JavaThread::run(),java_start(Thread*),start_thread,clone
              1 pthread_cond_wait@@GLIBC_2.3.2,impala::ConditionVariable::Wait,impala::BlockingQueue<impala::HdfsOp>::BlockingGet,impala::ThreadPool<impala::HdfsOp>::WorkerThread,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::HdfsOp>*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,impala::DiskIoMgr::CancelContext,impala::DiskIoMgr::UnregisterContext,impala::TmpFileMgr::FileGroup::Close,impala::BufferPoolTest::TearDown,void,testing::Test::Run(),testing::TestInfo::Run(),testing::TestCase::Run(),testing::internal::UnitTestImpl::RunAllTests(),testing::UnitTest::Run(),main
              1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,impala::AdmissionController::DequeueLoop,boost::_mfi::mf0<void,,boost::_bi::list1<boost::_bi::value<impala::AdmissionController*>,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 pthread_cond_timedwait@@GLIBC_2.3.2,os::PlatformEvent::park(long),Monitor::IWait(Thread*,,Monitor::wait(bool,,WatcherThread::sleep(),WatcherThread::run(),java_start(Thread*),start_thread,clone
              1 pthread_cond_timedwait@@GLIBC_2.3.2,os::PlatformEvent::park(long),Monitor::IWait(Thread*,,Monitor::wait(bool,,VMThread::loop(),VMThread::run(),java_start(Thread*),start_thread,clone
              1 nanosleep,std::this_thread::sleep_for<long,,impala::SleepForMs,PauseMonitorLoop,boost::detail::function::void_function_invoker0<void,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 nanosleep,std::this_thread::sleep_for<long,,impala::SleepForMs,impala::PeriodicCounterUpdater::UpdateLoop,boost::_mfi::mf0<void,,boost::_bi::list1<boost::_bi::value<impala::PeriodicCounterUpdater*>,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 nanosleep,sleep,MaintenanceThread,boost::detail::function::void_function_invoker0<void,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 __lll_lock_wait,pthread_cond_broadcast@@GLIBC_2.3.2,impala::ConditionVariable::NotifyAll,impala::BufferPool::Client::WriteCompleteCallback,impala::BufferPool::Client::<lambda(const,std::_Function_handler<void(const,std::function<void(const,impala::TmpFileMgr::WriteHandle::WriteComplete,impala::TmpFileMgr::FileGroup::WriteComplete,impala::TmpFileMgr::FileGroup::<lambda(const,std::_Function_handler<void(const,std::function<void(const,impala::DiskIoMgr::HandleWriteFinished,impala::DiskIoMgr::Write,impala::DiskIoMgr::WorkLoop,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>::operator(),impala::Thread::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string<char,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone
              1 
        

        The interesting stacks look to be:

        Thread 1 (Thread 0x7f8d711a8880 (LWP 27380)):                                                       
        #0  0x0000003cf300b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0             
        #1  0x0000000001199fbd in boost::condition_variable::wait (this=0xb53ae88, m=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73
        #2  0x00000000012fa785 in impala::DiskIoMgr::CancelContext (this=0xb363f80, context=0xb53ad20, wait_for_disks_completion=true) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:469
        #3  0x00000000012fa266 in impala::DiskIoMgr::UnregisterContext (this=0xb363f80, reader=0xb53ad20) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:431
        #4  0x00000000012bf88f in impala::TmpFileMgr::FileGroup::Close (this=0xa5bbed0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:290
        #5  0x000000000119bc61 in impala::BufferPoolTest::TearDown (this=0x695d840) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:54
        #6  0x00000000029066c3 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
        #7  0x00000000028fd429 in testing::Test::Run() ()                                                   
        #8  0x00000000028fd5a8 in testing::TestInfo::Run() ()                                               
        #9  0x00000000028fd685 in testing::TestCase::Run() ()                                               
        #10 0x00000000028fe908 in testing::internal::UnitTestImpl::RunAllTests() ()                         
        #11 0x00000000028febe3 in testing::UnitTest::Run() ()                                               
        #12 0x000000000119819e in main (argc=2, argv=0x7fff57bfdf28) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:689
        

        I notice that it could potentially hang if CancelContext() is called concurrently by two threads, since the condition variable is only signalled once. I don't think this is the bug though:

          void DecrementDiskRefCount() {
            // boost doesn't let us dcheck that the reader lock is taken
            DCHECK_GT(num_disks_with_ranges_, 0);
            if (--num_disks_with_ranges_ == 0) {
              disks_complete_cond_var_.notify_one();
            }
            DCHECK(Validate()) << std::endl << DebugString();
          }
        

        I suspect it might be related to this other suspicious stack where a thread is stuck waiting for a lock inside a condition variable - I'm not sure how this is possible.

        Thread 33 (Thread 0x7f8d1bb08700 (LWP 28453)):                                                      
        #0  0x0000003cf300e054 in __lll_lock_wait () from /lib64/libpthread.so.0                            
        #1  0x0000003cf300bdb0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib64/libpthread.so.0        
        #2  0x00000000011b082c in impala::ConditionVariable::NotifyAll (this=0xafd97f0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/condition-variable.h:57
        #3  0x00000000011aec92 in impala::BufferPool::Client::WriteCompleteCallback (this=0x14e89e80, page=0xafd97b0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:606
        #4  0x00000000011ae190 in impala::BufferPool::Client::<lambda(const impala::Status&)>::operator()(const impala::Status &) const (__closure=0xb9f5d10, write_status=...)
            at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:570
        #5  0x00000000011afe29 in std::_Function_handler<void(const impala::Status&), impala::BufferPool::Client::WriteDirtyPagesAsync(int64_t)::<lambda(const impala::Status&)> >::_M_invoke(const std::_Any_data &, const impala::Status &) (__functor=..., __args#0=...)
            at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2039
        #6  0x00000000012c4eeb in std::function<void(const impala::Status&)>::operator()(const impala::Status &) const (this=0x7f8d1bb07250, __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2439
        #7  0x00000000012c26b3 in impala::TmpFileMgr::WriteHandle::WriteComplete (this=0xa5bbde0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:563
        #8  0x00000000012c1579 in impala::TmpFileMgr::FileGroup::WriteComplete (this=0xa5bbed0, handle=0xa5bbde0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:442
        #9  0x00000000012c0230 in impala::TmpFileMgr::FileGroup::<lambda(const impala::Status&)>::operator()(const impala::Status &) const (__closure=0xb9f5cc0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:366
        #10 0x00000000012c305b in std::_Function_handler<void(const impala::Status&), impala::TmpFileMgr::FileGroup::Write(impala::MemRange, impala::TmpFileMgr::WriteDoneCallback, std::unique_ptr<impala::TmpFileMgr::WriteHandle>*)::<lambda(const impala::Status&)> >::_M_invoke(const std::_Any_data &, const impala::Status &) (__functor=..., __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2039
        #11 0x00000000012c4eeb in std::function<void(const impala::Status&)>::operator()(const impala::Status &) const (this=0xb7b9ee0, __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2439
        #12 0x00000000012fe905 in impala::DiskIoMgr::HandleWriteFinished (this=0xb363f80, writer=0xb53ad20, write_range=0xb7b9e90, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:967
        #13 0x0000000001300752 in impala::DiskIoMgr::Write (this=0xb363f80, writer_context=0xb53ad20, write_range=0xb7b9e90) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:1192
        #14 0x00000000012ff505 in impala::DiskIoMgr::WorkLoop (this=0xb363f80, disk_queue=0xa9a08c0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:1064
        #15 0x000000000130be90 in boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>::operator() (this=0x131eeb00, p=0xb363f80, a1=0xa9a08c0)
            at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165
        #16 0x000000000130bacd in boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>, boost::_bi::value<impala::DiskIoMgr::DiskQueue*> >::operator()<boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>, boost::_bi::list0> (this=0x131eeb10, f=..., a=...)
            at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313
        #17 0x000000000130b223 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>, boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>, boost::_bi::value<impala::DiskIoMgr::DiskQueue*> > >::operator() (this=0x131eeb00)
            at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
        #18 0x000000000130a677 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>, boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>, boost::_bi::value<impala::DiskIoMgr::DiskQueue*> > >, void>::invoke (function_obj_ptr=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:153
        #19 0x00000000012681b8 in boost::function0<void>::operator() (this=0x7f8d1bb07c40) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:767
        #20 0x0000000001579897 in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7fff57bfd610) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/thread.cc:317
        #21 0x0000000001580870 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x9e59dc0, f=@0x9e59db8, a=...)
            at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457
        #22 0x00000000015807b3 in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0x9e59db8) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
        #23 0x000000000158070e in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x9e59c00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/thread/detail/thread.hpp:116
        #24 0x0000000001a9b94a in thread_proxy ()                                                           
        #25 0x0000003cf3007851 in start_thread () from /lib64/libpthread.so.0                               
        #26 0x0000003cf2ce894d in clone () from /lib64/libc.so.6       
        
        Show
        tarmstrong Tim Armstrong added a comment - The check failures are red herrings - they're all expected failures wrapped in DEBUG_DEATH tests. The bug is that it hangs. Summarised stacks are: 47 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,impala::DiskIoMgr::GetNextRequestRange,impala::DiskIoMgr::WorkLoop,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 18 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,pthread_mutex_lock,at,boost::unique_lock<boost::mutex>::lock,boost::unique_lock<boost::mutex>::unique_lock,impala::BlockingQueue<boost::function<void()>,impala::ThreadPool<boost::function<void()>,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<boost::function<void()>,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 15 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,pthread_mutex_lock,at,boost::unique_lock<boost::mutex>::lock,boost::unique_lock<boost::mutex>::unique_lock,impala::BlockingQueue<impala::HdfsOp>::BlockingGet,impala::ThreadPool<impala::HdfsOp>::WorkerThread,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::HdfsOp>*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 8 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),Monitor::IWait( Thread *,,Monitor::wait(bool,,GCTaskManager::get_task(unsigned,GCTaskThread::run(),java_start( Thread *),start_thread,clone 2 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),Monitor::IWait( Thread *,,Monitor::wait(bool,,CompileQueue::get(),CompileBroker::compiler_thread_loop(),JavaThread::thread_main_inner(),JavaThread::run(),java_start( Thread *),start_thread,clone 2 pthread_cond_wait@@GLIBC_2.3.2,impala::ConditionVariable::Wait,impala::BlockingQueue<boost::function<void()>,impala::ThreadPool<boost::function<void()>,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<boost::function<void()>,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 sem_wait,check_pending_signals(bool),signal_thread_entry(JavaThread*,,JavaThread::thread_main_inner(),JavaThread::run(),java_start( Thread *),start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),ObjectMonitor::wait( long ,,JVM_MonitorWait,??,??,??,??,??,?? 1 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),ObjectMonitor::wait( long ,,JVM_MonitorWait,??,??,?? 1 pthread_cond_wait@@GLIBC_2.3.2,os::PlatformEvent::park(),Monitor::IWait( Thread *,,Monitor::wait(bool,,ServiceThread::service_thread_entry(JavaThread*,,JavaThread::thread_main_inner(),JavaThread::run(),java_start( Thread *),start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,impala::ConditionVariable::Wait,impala::BlockingQueue<impala::HdfsOp>::BlockingGet,impala::ThreadPool<impala::HdfsOp>::WorkerThread,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::HdfsOp>*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,impala::DiskIoMgr::CancelContext,impala::DiskIoMgr::UnregisterContext,impala::TmpFileMgr::FileGroup::Close,impala::BufferPoolTest::TearDown,void,testing::Test::Run(),testing::TestInfo::Run(),testing::TestCase::Run(),testing::internal::UnitTestImpl::RunAllTests(),testing::UnitTest::Run(),main 1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,impala::AdmissionController::DequeueLoop,boost::_mfi::mf0<void,,boost::_bi::list1<boost::_bi::value<impala::AdmissionController*>,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,os::PlatformEvent::park( long ),Monitor::IWait( Thread *,,Monitor::wait(bool,,WatcherThread::sleep(),WatcherThread::run(),java_start( Thread *),start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,os::PlatformEvent::park( long ),Monitor::IWait( Thread *,,Monitor::wait(bool,,VMThread::loop(),VMThread::run(),java_start( Thread *),start_thread,clone 1 nanosleep,std::this_thread::sleep_for< long ,,impala::SleepForMs,PauseMonitorLoop,boost::detail::function::void_function_invoker0<void,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 nanosleep,std::this_thread::sleep_for< long ,,impala::SleepForMs,impala::PeriodicCounterUpdater::UpdateLoop,boost::_mfi::mf0<void,,boost::_bi::list1<boost::_bi::value<impala::PeriodicCounterUpdater*>,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 nanosleep,sleep,MaintenanceThread,boost::detail::function::void_function_invoker0<void,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 __lll_lock_wait,pthread_cond_broadcast@@GLIBC_2.3.2,impala::ConditionVariable::NotifyAll,impala::BufferPool::Client::WriteCompleteCallback,impala::BufferPool::Client::<lambda( const ,std::_Function_handler<void( const ,std::function<void( const ,impala::TmpFileMgr::WriteHandle::WriteComplete,impala::TmpFileMgr::FileGroup::WriteComplete,impala::TmpFileMgr::FileGroup::<lambda( const ,std::_Function_handler<void( const ,std::function<void( const ,impala::DiskIoMgr::HandleWriteFinished,impala::DiskIoMgr::Write,impala::DiskIoMgr::WorkLoop,boost::_mfi::mf1<void,,boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>,,boost::_bi::bind_t<void,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,boost::function0<void>:: operator (),impala:: Thread ::SuperviseThread,boost::_bi::list4<boost::_bi::value<std::basic_string< char ,,boost::_bi::bind_t<void,,boost::detail::thread_data<boost::_bi::bind_t<void,,thread_proxy,start_thread,clone 1 The interesting stacks look to be: Thread 1 ( Thread 0x7f8d711a8880 (LWP 27380)): #0 0x0000003cf300b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000001199fbd in boost::condition_variable::wait ( this =0xb53ae88, m=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73 #2 0x00000000012fa785 in impala::DiskIoMgr::CancelContext ( this =0xb363f80, context=0xb53ad20, wait_for_disks_completion= true ) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:469 #3 0x00000000012fa266 in impala::DiskIoMgr::UnregisterContext ( this =0xb363f80, reader=0xb53ad20) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:431 #4 0x00000000012bf88f in impala::TmpFileMgr::FileGroup::Close ( this =0xa5bbed0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:290 #5 0x000000000119bc61 in impala::BufferPoolTest::TearDown ( this =0x695d840) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:54 #6 0x00000000029066c3 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const *) () #7 0x00000000028fd429 in testing::Test::Run() () #8 0x00000000028fd5a8 in testing::TestInfo::Run() () #9 0x00000000028fd685 in testing::TestCase::Run() () #10 0x00000000028fe908 in testing::internal::UnitTestImpl::RunAllTests() () #11 0x00000000028febe3 in testing::UnitTest::Run() () #12 0x000000000119819e in main (argc=2, argv=0x7fff57bfdf28) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:689 I notice that it could potentially hang if CancelContext() is called concurrently by two threads, since the condition variable is only signalled once. I don't think this is the bug though: void DecrementDiskRefCount() { // boost doesn't let us dcheck that the reader lock is taken DCHECK_GT(num_disks_with_ranges_, 0); if (--num_disks_with_ranges_ == 0) { disks_complete_cond_var_.notify_one(); } DCHECK(Validate()) << std::endl << DebugString(); } I suspect it might be related to this other suspicious stack where a thread is stuck waiting for a lock inside a condition variable - I'm not sure how this is possible. Thread 33 ( Thread 0x7f8d1bb08700 (LWP 28453)): #0 0x0000003cf300e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003cf300bdb0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #2 0x00000000011b082c in impala::ConditionVariable::NotifyAll ( this =0xafd97f0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/condition-variable.h:57 #3 0x00000000011aec92 in impala::BufferPool::Client::WriteCompleteCallback ( this =0x14e89e80, page=0xafd97b0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:606 #4 0x00000000011ae190 in impala::BufferPool::Client::<lambda( const impala::Status&)>:: operator ()( const impala::Status &) const (__closure=0xb9f5d10, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:570 #5 0x00000000011afe29 in std::_Function_handler<void( const impala::Status&), impala::BufferPool::Client::WriteDirtyPagesAsync(int64_t)::<lambda( const impala::Status&)> >::_M_invoke( const std::_Any_data &, const impala::Status &) (__functor=..., __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2039 #6 0x00000000012c4eeb in std::function<void( const impala::Status&)>:: operator ()( const impala::Status &) const ( this =0x7f8d1bb07250, __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2439 #7 0x00000000012c26b3 in impala::TmpFileMgr::WriteHandle::WriteComplete ( this =0xa5bbde0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:563 #8 0x00000000012c1579 in impala::TmpFileMgr::FileGroup::WriteComplete ( this =0xa5bbed0, handle=0xa5bbde0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:442 #9 0x00000000012c0230 in impala::TmpFileMgr::FileGroup::<lambda( const impala::Status&)>:: operator ()( const impala::Status &) const (__closure=0xb9f5cc0, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/tmp-file-mgr.cc:366 #10 0x00000000012c305b in std::_Function_handler<void( const impala::Status&), impala::TmpFileMgr::FileGroup::Write(impala::MemRange, impala::TmpFileMgr::WriteDoneCallback, std::unique_ptr<impala::TmpFileMgr::WriteHandle>*)::<lambda( const impala::Status&)> >::_M_invoke( const std::_Any_data &, const impala::Status &) (__functor=..., __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2039 #11 0x00000000012c4eeb in std::function<void( const impala::Status&)>:: operator ()( const impala::Status &) const ( this =0xb7b9ee0, __args#0=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2439 #12 0x00000000012fe905 in impala::DiskIoMgr::HandleWriteFinished ( this =0xb363f80, writer=0xb53ad20, write_range=0xb7b9e90, write_status=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:967 #13 0x0000000001300752 in impala::DiskIoMgr::Write ( this =0xb363f80, writer_context=0xb53ad20, write_range=0xb7b9e90) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:1192 #14 0x00000000012ff505 in impala::DiskIoMgr::WorkLoop ( this =0xb363f80, disk_queue=0xa9a08c0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/disk-io-mgr.cc:1064 #15 0x000000000130be90 in boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>:: operator () ( this =0x131eeb00, p=0xb363f80, a1=0xa9a08c0) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165 #16 0x000000000130bacd in boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>, boost::_bi::value<impala::DiskIoMgr::DiskQueue*> >:: operator ()<boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>, boost::_bi::list0> ( this =0x131eeb10, f=..., a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313 #17 0x000000000130b223 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>, boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>, boost::_bi::value<impala::DiskIoMgr::DiskQueue*> > >:: operator () ( this =0x131eeb00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20 #18 0x000000000130a677 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::DiskIoMgr, impala::DiskIoMgr::DiskQueue*>, boost::_bi::list2<boost::_bi::value<impala::DiskIoMgr*>, boost::_bi::value<impala::DiskIoMgr::DiskQueue*> > >, void>::invoke (function_obj_ptr=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:153 #19 0x00000000012681b8 in boost::function0<void>:: operator () ( this =0x7f8d1bb07c40) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:767 #20 0x0000000001579897 in impala:: Thread ::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7fff57bfd610) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/thread.cc:317 #21 0x0000000001580870 in boost::_bi::list4<boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise< long int >*> >:: operator ()<void (*)( const std::basic_string< char >&, const std::basic_string< char >&, boost::function<void()>, impala::Promise< long int >*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)( const std::basic_string< char , std::char_traits< char >, std::allocator< char > > &, const std::basic_string< char , std::char_traits< char >, std::allocator< char > > &, boost::function<void()>, impala::Promise< long > *), boost::_bi::list0 &, int ) ( this =0x9e59dc0, f=@0x9e59db8, a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457 #22 0x00000000015807b3 in boost::_bi::bind_t<void, void (*)( const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, boost::function<void()>, impala::Promise< long int >*), boost::_bi::list4<boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise< long int >*> > >:: operator ()(void) ( this =0x9e59db8) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20 #23 0x000000000158070e in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)( const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, const std::basic_string< char , std::char_traits< char >, std::allocator< char > >&, boost::function<void()>, impala::Promise< long int >*), boost::_bi::list4<boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<std::basic_string< char , std::char_traits< char >, std::allocator< char > > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise< long int >*> > > >::run(void) ( this =0x9e59c00) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0-p1/include/boost/thread/detail/thread.hpp:116 #24 0x0000000001a9b94a in thread_proxy () #25 0x0000003cf3007851 in start_thread () from /lib64/libpthread.so.0 #26 0x0000003cf2ce894d in clone () from /lib64/libc.so.6
        Hide
        tarmstrong Tim Armstrong added a comment -

        I think the hang was in EvictPageSameClient, based on counting the number of "Logging initialized" methods in the INFO logs. If it happens again it would be good to collect all of the BE test log directory which includes more output.

        Show
        tarmstrong Tim Armstrong added a comment - I think the hang was in EvictPageSameClient, based on counting the number of "Logging initialized" methods in the INFO logs. If it happens again it would be good to collect all of the BE test log directory which includes more output.
        Hide
        jbapple Jim Apple added a comment -

        I started this job to run the BE tests 100 times in succession:

        http://jenkins.impala.io:8080/view/Utility/job/ubuntu-14.04-from-scratch/943/

        Show
        jbapple Jim Apple added a comment - I started this job to run the BE tests 100 times in succession: http://jenkins.impala.io:8080/view/Utility/job/ubuntu-14.04-from-scratch/943/
        Hide
        tarmstrong Tim Armstrong added a comment -

        Looks like it's a race in tearing down the page and client, I was able to reproduce with this diff:

        tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ git diff
        diff --git a/be/src/runtime/bufferpool/buffer-pool.cc b/be/src/runtime/bufferpool/buffer-pool.cc
        index 863effc..f5205e1 100644
        --- a/be/src/runtime/bufferpool/buffer-pool.cc
        +++ b/be/src/runtime/bufferpool/buffer-pool.cc
        @@ -602,6 +602,7 @@ void BufferPool::Client::WriteCompleteCallback(Page* page, const Status& write_s
             in_flight_write_bytes_ -= page->len;
             WriteDirtyPagesAsync(); // Start another asynchronous write if needed.
           }
        +  SleepForMs(1000);
           write_complete_cv_.NotifyAll();
           page->write_complete_cv_.NotifyAll();
         }
        
        Show
        tarmstrong Tim Armstrong added a comment - Looks like it's a race in tearing down the page and client, I was able to reproduce with this diff: tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ git diff diff --git a/be/src/runtime/bufferpool/buffer-pool.cc b/be/src/runtime/bufferpool/buffer-pool.cc index 863effc..f5205e1 100644 --- a/be/src/runtime/bufferpool/buffer-pool.cc +++ b/be/src/runtime/bufferpool/buffer-pool.cc @@ -602,6 +602,7 @@ void BufferPool::Client::WriteCompleteCallback(Page* page, const Status& write_s in_flight_write_bytes_ -= page->len; WriteDirtyPagesAsync(); // Start another asynchronous write if needed. } + SleepForMs(1000); write_complete_cv_.NotifyAll(); page->write_complete_cv_.NotifyAll(); }
        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-4946: fix hang in BufferPool

        Once the write is removed from the "in flight" list,
        both the Client and Page may be destroyed by a different
        thread. The fix is to signal condition variables before
        inside the critical section that removes the write from
        the in flight list.

        Also fix a potential pitfall with DiskIoMgr::CancelContext()
        where concurrent calls to the method, which can be called
        asynchronously with other methods, could result in a hang in
        DiskIoMgr::CancelContext(). I do not believe any Impala code
        calls it concurrently from multiple threads, so the bug was
        only latent.

        Testing:
        I was able to reproduce reliably by inserting a 1s sleep before
        the NotifyAll() calls. After the fix, the hang didn't reproduce
        with sleeps inside or outside the critical section.

        I could not come up with a unit test that had a higher reproduction
        rate than the current tests - the window for the race is very small.
        I considered adding a debug stress option to insert these delays,
        but with all the code moved into the critical section it wouldn't
        be useful.

        Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347
        Reviewed-on: http://gerrit.cloudera.org:8080/6224
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-4946 : fix hang in BufferPool Once the write is removed from the "in flight" list, both the Client and Page may be destroyed by a different thread. The fix is to signal condition variables before inside the critical section that removes the write from the in flight list. Also fix a potential pitfall with DiskIoMgr::CancelContext() where concurrent calls to the method, which can be called asynchronously with other methods, could result in a hang in DiskIoMgr::CancelContext(). I do not believe any Impala code calls it concurrently from multiple threads, so the bug was only latent. Testing: I was able to reproduce reliably by inserting a 1s sleep before the NotifyAll() calls. After the fix, the hang didn't reproduce with sleeps inside or outside the critical section. I could not come up with a unit test that had a higher reproduction rate than the current tests - the window for the race is very small. I considered adding a debug stress option to insert these delays, but with all the code moved into the critical section it wouldn't be useful. Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347 Reviewed-on: http://gerrit.cloudera.org:8080/6224 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            henryr Henry Robinson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development