Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4038

RPC delays for single query can lead to ImpalaServer not making progress on any queries

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Distributed Exec
    • Labels:

      Description

      We observed a phenomenon where all Impala queries submitted to an Impala daemon got stuck in the CREATED state. One of the causes was an RPC not timing out (IMPALA-2799), but this was greatly exacerbated by the locking in ImpalaServer, which meant that no queries could make progress.

      We saw threads in the following callstacks.

      Either:
      a) Waiting for query_exec_state_map_lock_
      b) Holding query_exec_state_map_lock_ and waiting for QueryExecState::lock_
       38 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,boost::mutex::lock(),impala::ImpalaServer::GetQueryExecState(impala::TUniqueId,impala::ImpalaServer::ReportExecStatus(impala::TReportExecStatusResult&,,impala::ImpalaInternalServiceProcessor::process_ReportExecStatus(int,,impala::ImpalaInternalServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,,apache::thrift::TDispatchProcessor::process(boost::shared_ptr<apache::thrift::protocol::TProtocol>,,apache::thrift::server::TThreadedServer::Task::run(),impala::ThriftThread::RunRunnable(boost::shared_ptr<apache::thrift::concurrency::Runnable>,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,impala::Thread::SuperviseThread(std::string,boost::detail::thread_data<boost::_bi::bind_t<void,,??,start_thread,clone
       30 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,boost::mutex::lock(),impala::ImpalaServer::GetQueryExecState(impala::TUniqueId,impala::ImpalaServer::QuerySummaryCallback(bool,,impala::Webserver::RenderUrlWithTemplate(std::map<std::string,,impala::Webserver::BeginRequestCallback(sq_connection*,,??,??,??,start_thread,clone
      
      Waiting for Coordinator::lock_. Holds QueryExecState::lock_.
      1 __lll_lock_wait,_L_lock_854,pthread_mutex_lock,boost::mutex::lock(),impala::Coordinator::Cancel(impala::Status,impala::ImpalaServer::QueryExecState::Cancel(impala::Status,impala::ImpalaServer::CancelInternal(impala::TUniqueId,impala::ImpalaServer::UnregisterQuery(impala::TUniqueId,impala::ImpalaServer::CancelQueryUrlCallback(std::map<std::string,,impala::Webserver::RenderUrlWithTemplate(std::map<std::string,,impala::Webserver::BeginRequestCallback(sq_connection*,,??,??,??,start_thread,clone
      
      Waiting for the threads in ExecRemoteFragment(). Holds Coordinator::lock_.
      1 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait(boost::unique_lock<boost::mutex>&),impala::Promise<bool>::Get(),impala::Coordinator::StartRemoteFragments(impala::QuerySchedule*),impala::Coordinator::Exec(impala::QuerySchedule&,,impala::ImpalaServer::QueryExecState::ExecQueryOrDmlRequest(impala::TQueryExecRequest,impala::ImpalaServer::QueryExecState::ExecDdlRequest(),impala::ImpalaServer::QueryExecState::Exec(impala::TExecRequest*),impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx,impala::ImpalaServer::Execute(impala::TQueryCtx*,,impala::ImpalaServer::query(beeswax::QueryHandle&,,beeswax::BeeswaxServiceProcessor::process_query(int,,beeswax::BeeswaxServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,,apache::thrift::TDispatchProcessor::process(boost::shared_ptr<apache::thrift::protocol::TProtocol>,,apache::thrift::server::TThreadPoolServer::Task::run(),apache::thrift::concurrency::ThreadManager::Worker::run(),impala::ThriftThread::RunRunnable(boost::shared_ptr<apache::thrift::concurrency::Runnable>,,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,impala::Thread::SuperviseThread(std::string,boost::detail::thread_data<boost::_bi::bind_t<void,,??,start_thread,clone
      
      Waiting on network. Hold FragmentExecStatus::lock_. Previous thread is waiting for these threads to complete.
      2 read,??,BIO_read,ssl23_read_bytes,ssl23_connect,apache::thrift::transport::TSSLSocket::checkHandshake(),apache::thrift::transport::TSSLSocket::write(unsigned,apache::thrift::transport::TBufferedTransport::flush(),apache::thrift::transport::TSaslTransport::sendSaslMessage(apache::thrift::transport::NegotiationStatus,,apache::thrift::transport::TSaslClientTransport::handleSaslStartMessage(),apache::thrift::transport::TSaslTransport::open(),impala::ThriftClientImpl::Open(),impala::ThriftClientImpl::OpenWithRetry(unsigned,impala::ClientCacheHelper::CreateClient(impala::TNetworkAddress,impala::ClientCacheHelper::GetClient(impala::TNetworkAddress,impala::ClientConnection<impala::ImpalaBackendClient>::ClientConnection(impala::ClientCache<impala::ImpalaBackendClient>*,,impala::Coordinator::ExecRemoteFragment(impala::FragmentExecParams,boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,,impala::CallableThreadPool::Worker(int,,impala::ThreadPool<boost::function<void,impala::Thread::SuperviseThread(std::string,boost::detail::thread_data<boost::_bi::bind_t<void,,??,start_thread,clone
      

      The basic problem is a mismatch between the durations that two sets of locks are meant to be held for. ImpalaServer::query_exec_state_map_lock_ and QueryExecState::lock_ must be acquired at various points for a query to make progress, so it's important that they're only held for short critical sections, . Coordinator::lock_ is held for long durations during RPCs in the coordinator, but holding that lock can't prevent other queries from making progress, so this is ok.

      So it's bad if a thread tries to get Coordinator::lock_ while holding either of the other locks, since it may wait a long time while holding the other locks and block progress for a long time for all other queries in the system.

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          I think we should try to fix this and IMPALA-4037 together since the underlying problem (lock ordering/acquisition) is the same.

          Show
          tarmstrong Tim Armstrong added a comment - I think we should try to fix this and IMPALA-4037 together since the underlying problem (lock ordering/acquisition) is the same.
          Hide
          henryr Henry Robinson added a comment -

          IIRC the problem is the way you can hold both the map lock and wait for the exec state lock. There's no good reason for GetQueryExecState to allow callers to do this, and we've seen other situations where using the web pages to look at the query state for a query that's loading metadata can block all query submissions.

          I have a patch that removes the 'should lock the exec state' argument from GetQueryExecState. It's not active because I didn't want to focus on it for 2.7 but I will pick it up this week: https://gerrit.cloudera.org/#/c/3681/

          Show
          henryr Henry Robinson added a comment - IIRC the problem is the way you can hold both the map lock and wait for the exec state lock. There's no good reason for GetQueryExecState to allow callers to do this, and we've seen other situations where using the web pages to look at the query state for a query that's loading metadata can block all query submissions. I have a patch that removes the 'should lock the exec state' argument from GetQueryExecState . It's not active because I didn't want to focus on it for 2.7 but I will pick it up this week: https://gerrit.cloudera.org/#/c/3681/
          Hide
          tarmstrong Tim Armstrong added a comment -

          I think Henry's patch will help here in that it breaks one part of the dependency chain. I also think that CancelInternal() shouldn't hold Coordinator::lock_ and QueryExecState::lock_ at the same time, so I'm going to put together a fix for that.

          Show
          tarmstrong Tim Armstrong added a comment - I think Henry's patch will help here in that it breaks one part of the dependency chain. I also think that CancelInternal() shouldn't hold Coordinator::lock_ and QueryExecState::lock_ at the same time, so I'm going to put together a fix for that.
          Hide
          sailesh Sailesh Mukil added a comment -

          Along with the timeouts, fixing IMPALA-2864 will also help here.

          Show
          sailesh Sailesh Mukil added a comment - Along with the timeouts, fixing IMPALA-2864 will also help here.
          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-4037,IMPALA-4038: fix locking during query cancellation

          • Refactor the child query handling out of QueryExecState and clarify
            locking rules.
          • Avoid holding QueryExecState::lock_ while calling
            Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs
            or acquire ImpalaServer::query_exec_state_map_lock_.
          • Fix a potential race between QueryExecState::Exec() and
            QueryExecState::Cancel() where the cancelling thread did an unlocked
            read of the 'coordinator_' field and may not have cancelled the
            coordinator.

          Testing:
          Ran exhaustive build, ran local stress test for a bit.

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-4037 , IMPALA-4038 : fix locking during query cancellation Refactor the child query handling out of QueryExecState and clarify locking rules. Avoid holding QueryExecState::lock_ while calling Coordinator::Cancel() or ChildQuery::Cancel(), which can both do RPCs or acquire ImpalaServer::query_exec_state_map_lock_. Fix a potential race between QueryExecState::Exec() and QueryExecState::Cancel() where the cancelling thread did an unlocked read of the 'coordinator_' field and may not have cancelled the coordinator. Testing: Ran exhaustive build, ran local stress test for a bit.
          Hide
          laszlog Laszlo Gaal added a comment -

          Code review for this bug is: https://gerrit.cloudera.org/#/c/4163/

          Show
          laszlog Laszlo Gaal added a comment - Code review for this bug is: https://gerrit.cloudera.org/#/c/4163/

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development