Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5202

Debug action WAIT in PREPARE leads to hung query that cannot be cancelled.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • Impala 2.8.0
    • Impala 3.1.0
    • Backend, Infrastructure
    • None

    Description

      I believe recent changes to coordination and distributed execution have broken the WAIT debug action when called in some phases, e.g. PREPARE.

      The following repro leads to a hung query that cannot be cancelled. Impala's WebUI hangs, so cannot cancel from there either.

      set debug_action="0:PREPARE:WAIT";
      select 1 from functional.alltypes;
      

      I tried WAIT in PREPARE with other simple queries, targeting other exec nodes (e.g., top-n) with the same result.

      I am not sure why our test_failpoints.py or test_cancellation.py did not catch this.

      Attached:
      I ran an experiment with a single impalad. I ran the above sequence and then issued a ctrl+c from the impala shell to cancel the query. At that point, I collected the stacks of all threads.

      Interesting stacks:

      Thread 3 (Thread 0x7fda1238c700 (LWP 8872)):
      #0  0x00007fda9b97183d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007fda9b9716dc in sleep () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x00000000016ab891 in impala::ExecNode::ExecDebugAction (this=0xc965800, phase=impala::TExecNodePhase::PREPARE, state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:430
      #3  0x00000000016a8c4c in impala::ExecNode::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:148
      #4  0x00000000017ee510 in impala::ScanNode::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/scan-node.cc:51
      #5  0x00000000016dc14b in impala::HdfsScanNodeBase::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node-base.cc:175
      #6  0x00000000016d3516 in impala::HdfsScanNode::Prepare (this=0xc965800, state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node.cc:167
      #7  0x0000000001a716d1 in impala::PlanFragmentExecutor::PrepareInternal (this=0xc9645d0, qs=0x9382800, tdesc_tbl=..., fragment_ctx=..., instance_ctx=...) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:215
      #8  0x0000000001a6fd69 in impala::PlanFragmentExecutor::Prepare (this=0xc9645d0, query_state=0x9382800, desc_tbl=..., fragment_ctx=..., instance_ctx=...) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:99
      #9  0x0000000001a6cce5 in impala::FragmentInstanceState::Exec (this=0xc964300) at /home/abehm/impala/be/src/runtime/fragment-instance-state.cc:64
      #10 0x0000000001a783d1 in impala::QueryExecMgr::ExecFInstance (this=0xb870ba0, fis=0xc964300) at /home/abehm/impala/be/src/runtime/query-exec-mgr.cc:110
      #11 0x0000000001a7b1fa in boost::_mfi::mf1<void, impala::QueryExecMgr, impala::FragmentInstanceState*>::operator() (this=0xac8ce60, p=0xb870ba0, a1=0xc964300) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165
      #12 0x0000000001a7b083 in boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, boost::_bi::value<impala::FragmentInstanceState*> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xac8ce70, f=..., a=...) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313
      
      Thread 2 (Thread 0x7fda10b89700 (LWP 8874)):
      #0  0x00007fda9bc7cd84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      #1  0x00000000011c1f6d in boost::condition_variable::wait (this=0xc962be0, m=...) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73
      #2  0x000000000133caf7 in impala::Promise<impala::Status>::Get (this=0xc962be0) at /home/abehm/impala/be/src/util/promise.h:67
      #3  0x0000000001a6ff70 in impala::PlanFragmentExecutor::WaitForOpen (this=0xc9629d0) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:108
      #4  0x0000000001a38e2f in impala::Coordinator::Wait (this=0xbe72d00) at /home/abehm/impala/be/src/runtime/coordinator.cc:1063
      #5  0x000000000152be3c in impala::ImpalaServer::QueryExecState::WaitInternal (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:666
      #6  0x000000000152b960 in impala::ImpalaServer::QueryExecState::Wait (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:634
      #7  0x0000000001547643 in boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>::operator() (this=0x7fda10b88d78, p=0x972ac00) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:49
      #8  0x0000000001547260 in boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> >::operator()<boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, boost::_bi::list0> (this=0x7fda10b88d88, f=..., a=...) at /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:253
      

      Attachments

        1. stacks.txt.gz
          31 kB
          Alexander Behm

        Issue Links

          Activity

            People

              dhecht Daniel Hecht
              alex.behm Alexander Behm
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: