Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
Impala 1.4.1
Description
Our cluster sometimes stop responding to new connections, stack trace shows that server accept thread is blocked(waiting for sasl setup).
The problem is that there is only one acceptor thread, if this thread blocks, server cannot accept new connections.
Thread 42 (Thread 0x7f03ac2e6700 (LWP 15497)): #0 0x00007f046bb6994c in recv () from /lib64/libpthread.so.0 #1 0x0000000001284f5d in apache::thrift::transport::TSocket::read(unsigned char*, unsigned int) () #2 0x0000000001286caf in unsigned int apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, unsigned char*, unsigned int) () #3 0x000000000099c9e8 in apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*, unsigned int*) () #4 0x000000000099a1b4 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() () #5 0x000000000099cebe in apache::thrift::transport::TSaslTransport::open() () #6 0x000000000099b349 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>) () #7 0x000000000128f003 in apache::thrift::server::TThreadPoolServer::serve() () #8 0x000000000089390c in impala::ThriftServer::ThriftServerEventProcessor::Supervise() () #9 0x00000000009c8ff8 in impala::Thread::SuperviseThread(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::Promise<long>*) () #10 0x00000000009c9c00 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() () #11 0x0000000000bb4ff3 in thread_proxy () #12 0x00007f046bb62851 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f046aea411d in clone () from /lib64/libc.so.6
reproduce the bug:
nc <impalad host> <port> &
./impala-shell -k -i <impalad host>:<port>
A quick fix is to add some timeout in TSaslTransport::open(). It is better to put sasl setup logic out of accept thread, this maybe more complex because user info is needed after connection setup.
Attachments
Issue Links
- duplicates
-
IMPALA-5394 Set socket timeouts while opening TSaslTransport
- Resolved