Details
-
Bug
-
Status: Resolved
-
Trivial
-
Resolution: Fixed
-
Impala 2.8.0
-
None
-
ghx-label-1
Description
I believe recent changes to coordination and distributed execution have broken the WAIT debug action when called in some phases, e.g. PREPARE.
The following repro leads to a hung query that cannot be cancelled. Impala's WebUI hangs, so cannot cancel from there either.
set debug_action="0:PREPARE:WAIT";
select 1 from functional.alltypes;
I tried WAIT in PREPARE with other simple queries, targeting other exec nodes (e.g., top-n) with the same result.
I am not sure why our test_failpoints.py or test_cancellation.py did not catch this.
Attached:
I ran an experiment with a single impalad. I ran the above sequence and then issued a ctrl+c from the impala shell to cancel the query. At that point, I collected the stacks of all threads.
Interesting stacks:
Thread 3 (Thread 0x7fda1238c700 (LWP 8872)): #0 0x00007fda9b97183d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fda9b9716dc in sleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00000000016ab891 in impala::ExecNode::ExecDebugAction (this=0xc965800, phase=impala::TExecNodePhase::PREPARE, state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:430 #3 0x00000000016a8c4c in impala::ExecNode::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:148 #4 0x00000000017ee510 in impala::ScanNode::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/scan-node.cc:51 #5 0x00000000016dc14b in impala::HdfsScanNodeBase::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node-base.cc:175 #6 0x00000000016d3516 in impala::HdfsScanNode::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node.cc:167 #7 0x0000000001a716d1 in impala::PlanFragmentExecutor::PrepareInternal (this=0xc9645d0, qs=0x9382800, tdesc_tbl=..., fragment_ctx=..., instance_ctx=...) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:215 #8 0x0000000001a6fd69 in impala::PlanFragmentExecutor::Prepare (this=0xc9645d0, query_state=0x9382800, desc_tbl=..., fragment_ctx=..., instance_ctx=...) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:99 #9 0x0000000001a6cce5 in impala::FragmentInstanceState::Exec (this=0xc964300) at /home/abehm/impala/be/src/runtime/fragment-instance-state.cc:64 #10 0x0000000001a783d1 in impala::QueryExecMgr::ExecFInstance (this=0xb870ba0, fis=0xc964300) at /home/abehm/impala/be/src/runtime/query-exec-mgr.cc:110 #11 0x0000000001a7b1fa in boost::_mfi::mf1<void, impala::QueryExecMgr, impala::FragmentInstanceState*>::operator() (this=0xac8ce60, p=0xb870ba0, a1=0xc964300) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165 #12 0x0000000001a7b083 in boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, boost::_bi::value<impala::FragmentInstanceState*> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xac8ce70, f=..., a=...) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313
Thread 2 (Thread 0x7fda10b89700 (LWP 8874)): #0 0x00007fda9bc7cd84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00000000011c1f6d in boost::condition_variable::wait (this=0xc962be0, m=...) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73 #2 0x000000000133caf7 in impala::Promise<impala::Status>::Get (this=0xc962be0) at /home/abehm/impala/be/src/util/promise.h:67 #3 0x0000000001a6ff70 in impala::PlanFragmentExecutor::WaitForOpen (this=0xc9629d0) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:108 #4 0x0000000001a38e2f in impala::Coordinator::Wait (this=0xbe72d00) at /home/abehm/impala/be/src/runtime/coordinator.cc:1063 #5 0x000000000152be3c in impala::ImpalaServer::QueryExecState::WaitInternal (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:666 #6 0x000000000152b960 in impala::ImpalaServer::QueryExecState::Wait (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:634 #7 0x0000000001547643 in boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>::operator() (this=0x7fda10b88d78, p=0x972ac00) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:49 #8 0x0000000001547260 in boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> >::operator()<boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, boost::_bi::list0> (this=0x7fda10b88d88, f=..., a=...) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:253
Attachments
Attachments
Issue Links
- is duplicated by
-
IMPALA-3789 debug action "PREPARE:WAIT" could cause deadlock and query cannot be cancelled
- Resolved