[IMPALA-3875] Thrift threaded server hang in some cases - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 2.6.0
Fix Version/s: Impala 2.8.0
Component/s: Distributed Exec
Labels:
None

Target Version:

Impala 2.8.0

Description

Hanging looks like this:

#0  0x000000398340e82d in read () from 05r/lib64/libpthread.so.0
#1  0x00000039870dea71 in ?? () from 05r/usr/lib64/libcrypto.so.10
#2  0x00000039870dcdc9 in BIO_read () from 05r/usr/lib64/libcrypto.so.10
#3  0x0000003989431873 in ssl23_read_bytes () from 05r/usr/lib64/libssl.so.10
#4  0x000000398942fe63 in ssl23_get_client_hello () from 05r/usr/lib64/libssl.so.10
#5  0x00000039894302f3 in ssl23_accept () from 05r/usr/lib64/libssl.so.10
#6  0x00000000015ee4bc in apache::thrift::transport::TSSLSocket::checkHandshake (this=0xf317b00) at src/thrift/transport/TSSLSocket.cpp:228
#7  0x00000000015ee820 in apache::thrift::transport::TSSLSocket::read (this=0xf317b00, buf=0x7f8a9ea750a0 "@S\247\236\212\177", len=5) at src/thrift/transport/TSSLSocket.cpp:164
#8  0x00000000015ebc4f in apache::thrift::transport::readAll<apache::thrift::transport::TSocket> (trans=..., buf=0x7f8a9ea750a0 "@S\247\236\212\177", len=5) at src/thrift/transport/TTransport.h:39
#9  0x0000000000a80228 in apache::thrift::transport::TTransport::readAll (len=5, buf=0x7f8a9ea750a0 "@S\247\236\212\177", this=<optimized out>) at /usr/src/debug/impala-2.3.0-cdh5.5.2/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:126
#10 apache::thrift::transport::TSaslTransport::receiveSaslMessage (this=0xb6a0770, status=0x7f8a9ea752e4, length=0x7f8a9ea752e8) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslTransport.cpp:237
#11 0x0000000000a7dc84 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage (this=0xb6a0770) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslServerTransport.cpp:80
#12 0x0000000000a8075e in apache::thrift::transport::TSaslTransport::open (this=0xb6a0770) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslTransport.cpp:95
#13 0x0000000000a7e9c1 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport (this=0xd0edcb0, trans=...) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslServerTransport.cpp:145
#14 0x00000000015f6f78 in apache::thrift::server::TThreadedServer::serve (this=0xc181420) at src/thrift/server/TThreadedServer.cpp:162
#15 0x000000000095149c in impala::ThriftServer::ThriftServerEventProcessor::Supervise (this=<optimized out>) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/rpc/thrift-server.cc:173
#16 0x0000000000ae0faa in boost::function0<void>::operator() (this=<optimized out>) at /opt/toolchain/boost-pic-1.55.0/include/boost/function/function_template.hpp:767
#17 impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) (name=..., category=..., functor=..., thread_started=0x7fff9af4ca60) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/util/thread.cc:314
#18 0x0000000000ae3250 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::string&, const std::string&, impala::Thread::ThreadFunctor, impala::Promise<long int>*), boost::_bi::list0> (a=...,
    f=@0xc3747b8: 0xae0df0 <impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*)>, this=0xc3747c0) at /opt/toolchain/boost-pic-1.55.0/include/boost/bind/bind.hpp:457
#19 boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0xc3747b8) at /opt/toolchain/boost-pic-1.55.0/include/boost/bind/bind_template.hpp:20
#20 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0xc374600) at /opt/toolchain/boost-pic-1.55.0/include/boost/thread/detail/thread.hpp:117
#21 0x0000000000d28c43 in ?? ()
#22 0x0000003983407aa1 in start_thread () from 05r/lib64/libpthread.so.0
#23 0x00000039830e893d in clone () from 05r/lib64/libc.so.6

This is very very bad that the whole threaded server thread will hang because it never gets a chance to dispatch the new serving thread by thread->start();

This impalad becomes zombie..

From http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/be/src/runtime/client-cache.cc#L106-L113
we should probably set socket timeout before OpenWithRetry().

Thrift threaded server hang in some cases

Details

Description

Attachments

Attachments

Activity

People

Dates