Details
-
Improvement
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.11.0, Impala 3.0, Impala 2.12.0
-
None
-
ghx-label-3
Description
In TAcceptQueueServer.cpp, we currently have one thread in the connection_setup_pool for handling connection establishment.
// Only using one thread here is sufficient for performance, and it avoids potential // thread safety issues with the thrift code called in SetupConnection. constexpr int CONNECTION_SETUP_POOL_SIZE = 1; // New - this is the thread pool used to process the internal accept queue. ThreadPool<shared_ptr<TTransport>> connection_setup_pool("setup-server", "setup-worker", CONNECTION_SETUP_POOL_SIZE, FLAGS_accepted_cnxn_queue_depth, [this](int tid, const shared_ptr<TTransport>& item) { this->SetupConnection(item); });
While that makes the code easier to reason about, it also makes Impala less robust in case a client intentionally or unintentionally freezes during connection establishment. For instance, one can telnet to the beeswax or HS2 port of Impala (e.g. telnet localhost:21000) and then leave the telnet session open. In a secure cluster with TLS enabled, Impalad will be stuck in the SSL handshake. Other clients trying to connect to that port (e.g. Impala shell) will hang forever.
Thread 551 (Thread 0x7fddde563700 (LWP 166354)): #0 0x0000003ce2a0e82d in read () from /lib64/libpthread.so.0 #1 0x0000003ce56dea71 in ?? () from /usr/lib64/libcrypto.so.10 #2 0x0000003ce56dcdc9 in BIO_read () from /usr/lib64/libcrypto.so.10 #3 0x0000003ce9a31873 in ssl23_read_bytes () from /usr/lib64/libssl.so.10 #4 0x0000003ce9a2fe63 in ssl23_get_client_hello () from /usr/lib64/libssl.so.10 #5 0x0000003ce9a302f3 in ssl23_accept () from /usr/lib64/libssl.so.10 #6 0x000000000208ebd5 in apache::thrift::transport::TSSLSocket::checkHandshake() () #7 0x000000000208edbc in apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) () #8 0x000000000208b6f3 in unsigned int apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, unsigned char*, unsigned int) () #9 0x0000000000cb2aa9 in apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*, unsigned int*) () #10 0x0000000000cb03e4 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() () #11 0x0000000000cb2c23 in apache::thrift::transport::TSaslTransport::doSaslNegotiation() () #12 0x0000000000cb10b8 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>) () #13 0x0000000000b13e47 in apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr<apache::thrift::transport::TTransport>) () #14 0x0000000000b14932 in boost::detail::function::void_function_obj_invoker2<apache::thrift::server::TAcceptQueueServer::serve()::{lambda(int, boost::shared_ptr<apache::thrift::transport::TTransport> const&)#1}, void, int, boost::shared_ptr<apache::thrift::transport::TTransport> const&>::invoke(boost::detail::function::function_buffer&, int, boost::shared_ptr<apache::thrift::transport::TTransport> const&) () #15 0x0000000000b177f9 in impala::ThreadPool<boost::shared_ptr<apache::thrift::transport::TTransport> >::WorkerThread(int) () #16 0x0000000000d602af in impala::Thread::SuperviseThread(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::ThreadDebugInfo const*, impala::Promise<long>*) () #17 0x0000000000d60aaa in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::ThreadDebugInfo const*, impala::Promise<long>*), boost::_bi::list5<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long>*> > > >::run() () #18 0x00000000012d756a in thread_proxy () #19 0x0000003ce2a07aa1 in start_thread () from /lib64/libpthread.so.0 #20 0x0000003ce26e893d in clone () from /lib64/libc.so.6
While it's not a complete fix, increasing the number of threads in connection_setup_pool makes it more robust against this kind of problem. Ideally, the number of threads in the thread pool should be configurable.
Attachments
Issue Links
- is related to
-
IMPALA-7638 Lower default timeout for connection setup
- Open
-
IMPALA-8159 TAcceptQueueServer: Caught TException: invalid sasl status
- Resolved
-
IMPALA-7566 TAcceptQueueServer connection setup should have timeout
- Resolved