Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5926

Avoid printing expensive stack when closing a session

    XMLWordPrintableJSON

Details

    • ghx-label-9

    Description

      Noticed that session teardown is very slow due to printing expensive call stack, this issue is very similar to IMPALA-5275.

      From https://github.com/apache/incubator-impala/blob/master/be/src/service/impala-server.cc#L1116

       // Unregister all open queries from this session.
        Status status("Session closed");
      

      From vtune

      CPU Time
      1 of 8: 72.4% (51.420s of 71.065s)
      
      impalad ! google::OpenObjectFileContainingPcAndGetStartAddress - [unknown source file]
      impalad ! google::SymbolizeAndDemangle + 0x488 - [unknown source file]
      impalad ! google::DumpStackTrace.constprop.9 + 0x6e - [unknown source file]
      impalad ! impala::GetStackTrace + 0x1e - debug-util.cc:326
      impalad ! impala::Status::Status + 0x68 - status.cc:122
      impalad ! impala::ImpalaServer::CloseSessionInternal + 0x1f9 - impala-server.cc:1116
      impalad ! impala::ImpalaServer::CloseSession + 0x11d - impala-hs2-server.cc:383
      impalad ! apache::hive::service::cli::thrift::TCLIServiceProcessor::process_CloseSession + 0x215 - TCLIService.cpp:4587
      impalad ! apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall + 0x113 - TCLIService.cpp:4506
      impalad ! apache::thrift::TDispatchProcessor::process + 0xab - TDispatchProcessor.h:121
      impalad ! apache::thrift::server::TThreadPoolServer::Task::run + 0x22a - [unknown source file]
      impalad ! apache::thrift::concurrency::ThreadManager::Worker::run + 0x2c8 - [unknown source file]
      impalad ! impala::ThriftThread::RunRunnable + 0x68 - thrift-thread.cc:74
      impalad ! boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>::operator() + 0x3a - mem_fn_template.hpp:280
      impalad ! operator()<boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<long unsigned int>*>, boost::_bi::list0> + 0x1e - bind.hpp:392
      impalad ! boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable>>, boost::_bi::value<impala::Promise<unsigned long>*>>>::operator() - bind_template.hpp:20
      impalad ! boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable>>, boost::_bi::value<impala::Promise<unsigned long>*>>>, void>::invoke + 0x9 - function_template.hpp:153
      impalad ! boost::function0<void>::operator() + 0x1a - function_template.hpp:767
      impalad ! impala::Thread::SuperviseThread + 0x1a7 - thread.cc:352
      impalad ! operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> + 0x5a - bind.hpp:457
      impalad ! boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void (void)>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void (void)>>, boost::_bi::value<impala::Promise<long>*>>>::operator() - bind_template.hpp:20
      impalad ! boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void (void)>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void (void)>>, boost::_bi::value<impala::Promise<long>*>>>>::run + 0x19 - thread.hpp:116
      impalad ! thread_proxy + 0xd9 - [unknown source file]
      libpthread-2.17.so ! start_thread + 0xc4 - [unknown source file]
      libc-2.17.so ! __clone + 0x6c - [unknown source file]
      
      I0912 18:05:28.278861 32644 status.cc:122] Session closed
          @           0x8a1d59  impala::Status::Status()
          @           0xb0319a  impala::ImpalaServer::CloseSessionInternal()
          @           0xb2be3e  impala::ImpalaServer::CloseSession()
          @           0xdecc76  apache::hive::service::cli::thrift::TCLIServiceProcessor::process_CloseSession()
          @           0xdf89d4  apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall()
          @           0x8719cc  apache::thrift::TDispatchProcessor::process()
          @          0x1c2cd6b  apache::thrift::server::TThreadPoolServer::Task::run()
          @          0x1c14709  apache::thrift::concurrency::ThreadManager::Worker::run()
          @           0xa3a239  impala::ThriftThread::RunRunnable()
          @           0xa3b012  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @           0xc3c0c2  impala::Thread::SuperviseThread()
          @           0xc3c824  boost::detail::thread_data<>::run()
          @           0xf37a7a  thread_proxy
          @     0x7fde88d22dc5  start_thread
          @     0x7fde88a5176d  __clone
      

      When there is a large number of queries finishing at the same time this becomes more problematic as printing the stack trace acquires a lock which blocks other queries trying to close their sessions.
      A portion of this time shows up under "Unregister query" in the query timeline.

          Query Timeline: 2m7s
             - Query submitted: 84.265us (84.265us)
             - Planning finished: 622.838ms (622.754ms)
             - Submit for admission: 664.240ms (41.401ms)
             - Completed admission: 3s622ms (2s958ms)
             - Ready to start on 7 backends: 3s716ms (94.521ms)
             - All 7 execution backends (8 fragment instances) started: 6s835ms (3s118ms)
             - Rows available: 21s924ms (15s088ms)
             - First row fetched: 22s187ms (263.418ms)
             - Unregister query: 2m7s (1m45s)
           - ComputeScanRangeAssignmentTimer: 26.472ms
      

      This was captured when running 128 concurrent queries against the same coordinator.

      Attachments

        Activity

          People

            mmokhtar Mostafa Mokhtar
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: