Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3875

Thrift threaded server hang in some cases

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Distributed Exec
    • Labels:
      None

      Description

      Hanging looks like this:

      #0  0x000000398340e82d in read () from 05r/lib64/libpthread.so.0
      #1  0x00000039870dea71 in ?? () from 05r/usr/lib64/libcrypto.so.10
      #2  0x00000039870dcdc9 in BIO_read () from 05r/usr/lib64/libcrypto.so.10
      #3  0x0000003989431873 in ssl23_read_bytes () from 05r/usr/lib64/libssl.so.10
      #4  0x000000398942fe63 in ssl23_get_client_hello () from 05r/usr/lib64/libssl.so.10
      #5  0x00000039894302f3 in ssl23_accept () from 05r/usr/lib64/libssl.so.10
      #6  0x00000000015ee4bc in apache::thrift::transport::TSSLSocket::checkHandshake (this=0xf317b00) at src/thrift/transport/TSSLSocket.cpp:228
      #7  0x00000000015ee820 in apache::thrift::transport::TSSLSocket::read (this=0xf317b00, buf=0x7f8a9ea750a0 "@S\247\236\212\177", len=5) at src/thrift/transport/TSSLSocket.cpp:164
      #8  0x00000000015ebc4f in apache::thrift::transport::readAll<apache::thrift::transport::TSocket> (trans=..., buf=0x7f8a9ea750a0 "@S\247\236\212\177", len=5) at src/thrift/transport/TTransport.h:39
      #9  0x0000000000a80228 in apache::thrift::transport::TTransport::readAll (len=5, buf=0x7f8a9ea750a0 "@S\247\236\212\177", this=<optimized out>) at /usr/src/debug/impala-2.3.0-cdh5.5.2/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:126
      #10 apache::thrift::transport::TSaslTransport::receiveSaslMessage (this=0xb6a0770, status=0x7f8a9ea752e4, length=0x7f8a9ea752e8) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslTransport.cpp:237
      #11 0x0000000000a7dc84 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage (this=0xb6a0770) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslServerTransport.cpp:80
      #12 0x0000000000a8075e in apache::thrift::transport::TSaslTransport::open (this=0xb6a0770) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslTransport.cpp:95
      #13 0x0000000000a7e9c1 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport (this=0xd0edcb0, trans=...) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslServerTransport.cpp:145
      #14 0x00000000015f6f78 in apache::thrift::server::TThreadedServer::serve (this=0xc181420) at src/thrift/server/TThreadedServer.cpp:162
      #15 0x000000000095149c in impala::ThriftServer::ThriftServerEventProcessor::Supervise (this=<optimized out>) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/rpc/thrift-server.cc:173
      #16 0x0000000000ae0faa in boost::function0<void>::operator() (this=<optimized out>) at /opt/toolchain/boost-pic-1.55.0/include/boost/function/function_template.hpp:767
      #17 impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) (name=..., category=..., functor=..., thread_started=0x7fff9af4ca60) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/util/thread.cc:314
      #18 0x0000000000ae3250 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::string&, const std::string&, impala::Thread::ThreadFunctor, impala::Promise<long int>*), boost::_bi::list0> (a=...,
          f=@0xc3747b8: 0xae0df0 <impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*)>, this=0xc3747c0) at /opt/toolchain/boost-pic-1.55.0/include/boost/bind/bind.hpp:457
      #19 boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0xc3747b8) at /opt/toolchain/boost-pic-1.55.0/include/boost/bind/bind_template.hpp:20
      #20 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0xc374600) at /opt/toolchain/boost-pic-1.55.0/include/boost/thread/detail/thread.hpp:117
      #21 0x0000000000d28c43 in ?? ()
      #22 0x0000003983407aa1 in start_thread () from 05r/lib64/libpthread.so.0
      #23 0x00000039830e893d in clone () from 05r/lib64/libc.so.6
      

      This is very very bad that the whole threaded server thread will hang because it never gets a chance to dispatch the new serving thread by thread->start();

      This impalad becomes zombie..

      From http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/be/src/runtime/client-cache.cc#L106-L113
      we should probably set socket timeout before OpenWithRetry().

        Attachments

        1. impala_thread_dump.out
          12 kB
          Antoni Ivanov
        2. impala_stacktrace.out
          628 kB
          Antoni Ivanov
        3. impala_connections.out
          20 kB
          Antoni Ivanov

          Activity

            People

            • Assignee:
              sailesh Sailesh Mukil
              Reporter:
              HuaisiXu Huaisi Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: