Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12439

Impala Daemon stucks on random executors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Distributed Exec
    • None
    • ghx-label-3

    Description

      Hi!

      In our cluster we face the next problem periodically: 

      1. The query fails with the error like this "Exec() rpc failed: Timed out: ExecQueryFInstances RPC to <node_ip>:27000 timed out after 300.000s". Every time when the problem appears the problem node may be different.

      2. We have analyzed minidumps of the impala daemon from two different cases (there are resolving minidumps in attachment).  It seems that impala daemon stuck on cancelation query fragment:  

      Thread 244
       0  libpthread-2.17.so + 0xba35
          rax = 0xfffffffffffffe00   rdx = 0x0000000000000002
          rcx = 0xffffffffffffffff   rbx = 0x000000007cd81b10
          rsi = 0x0000000000000080   rdi = 0x000000007cd81b14
          rbp = 0x00007f7ba5ae8580   rsp = 0x00007f7ba5ae8520
           r8 = 0x000000007cd81b00    r9 = 0x0000000000000000
          r10 = 0x0000000000000000   r11 = 0x0000000000000246
          r12 = 0x00000000eafe6400   r13 = 0x00007f7ba5ae85c0
          r14 = 0x00007f845b7287d0   r15 = 0x00007f7ba5ae8660
          rip = 0x00007f845b727a35
          Found by: given as instruction pointer in context
       1  impalad!impala::QueryState::Cancel() + 0xdb
          rbp = 0x00007f7ba5ae8600   rsp = 0x00007f7ba5ae8590
          rip = 0x00000000011791bb
          Found by: previous frame's frame pointer
       2  impalad!impala::ControlService::CancelQueryFInstances(impala::CancelQueryFInstancesRequestPB const*, impala::CancelQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) + 0x177
          rbx = 0x00007f8458e136a0   rbp = 0x00007f7ba5ae8780
          rsp = 0x00007f7ba5ae8610   r12 = 0x00007f7ba5ae8720
          r13 = 0x00007f7ba5ae86a0   rip = 0x0000000001218f77
          Found by: call frame info
       3  impalad!kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) + 0x17c
          rbx = 0x0000000015e4e460   rbp = 0x00007f7ba5ae87e0
          rsp = 0x00007f7ba5ae8790   r12 = 0x00000007a6bf8ee0
          r13 = 0x0000000014f86740   r14 = 0x0000000014f86f00
          r15 = 0x0000000014f87480   rip = 0x0000000001788ffc
          Found by: call frame info
       4  impalad!impala::ImpalaServicePool::RunThread() + 0x1be
          rbx = 0x00007f840000000d   rbp = 0x00007f7ba5ae88a0
          rsp = 0x00007f7ba5ae87f0   r12 = 0x0000000018b30f80
          r13 = 0x0000000000000000   r14 = 0x0000000000000051
          r15 = 0x00007f840000000d   rip = 0x00000000010dbdee
          Found by: call frame info
       5  impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) + 0x30b
          rbx = 0x00007f7ba5ae8970   rbp = 0x00007f7ba5ae8be0
          rsp = 0x00007f7ba5ae88b0   r12 = 0x00007ffed2cdb298
          r13 = 0x000000000592ee20   r14 = 0x00007f7ba5ae8910
          r15 = 0x00007f8458e136a0   rip = 0x0000000001435f8b
          Found by: call frame info
       6  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>), boost::_bi::list5<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() + 0x7a
          rbx = 0x0000000015e34e00   rbp = 0x00007f7ba5ae8c40
          rsp = 0x00007f7ba5ae8bf0   r12 = 0x00007f7ba5ae8c00
          r13 = 0x0000000001435c80   r14 = 0x0000000000000000
          r15 = 0x00007f7ba5ae9700   rip = 0x0000000001436e5a
          Found by: call frame info
       7  impalad!thread_proxy + 0xea
          rbx = 0x0000000015e34e00   rbp = 0x0000000000000000
          rsp = 0x00007f7ba5ae8c50   r12 = 0x00007f7ba5ae8c50
          r13 = 0x0000000000801000   r14 = 0x0000000000000000
          r15 = 0x00007f7ba5ae9700   rip = 0x0000000001c18e1a
          Found by: call frame info
       8  libpthread-2.17.so + 0x7ea5
          rbx = 0x0000000000000000   rbp = 0x0000000000000000
          rsp = 0x00007f7ba5ae8ca0   r12 = 0x0000000000000000
          r13 = 0x0000000000801000   r14 = 0x0000000000000000
          r15 = 0x00007f7ba5ae9700   rip = 0x00007f845b723ea5
          Found by: call frame info
       9  libc-2.17.so + 0xfeb0d
          rsp = 0x00007f7ba5ae8d40   rip = 0x00007f8458321b0d
          Found by: stack scanning

      Attachments

        1. resolved_420a96bf.txt
          4.02 MB
          Evgeniy
        2. resolved_d7750c55.txt
          2.96 MB
          Evgeniy

        Activity

          People

            Unassigned Unassigned
            skevgeniy Evgeniy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: