Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1972

Queries that take a long time to plan can cause webserver to block other queries

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Summary

      Trying to get the details of a query through the debug web page while the query is planning will block other queries (and the UI itself), because query_exec_state_map_lock_ will be held for the duration of planning.

      Details

      While a query is planning, it holds onto its query exec state's lock:

         lock_guard<mutex> l(*(*exec_state)->lock());
      
          // register exec state as early as possible so that queries that
          // take a long time to plan show up, and to handle incoming status
          // reports before execution starts.
          RETURN_IF_ERROR(RegisterQuery(session_state, *exec_state));
          *registered_exec_state = true;
      
          // GetExecRequest() does planning
          RETURN_IF_ERROR((*exec_state)->UpdateQueryStatus(
              exec_env_->frontend()->GetExecRequest(query_ctx, &result)));
      

      Query details callback

      ImpalaServer::QuerySummaryCallback, which handles /query_plan, tries to get the same exec state's lock (see true argument to GetQueryExecState().

      shared_ptr<QueryExecState> exec_state = GetQueryExecState(query_id, true);
      

      GetQueryExecState() holds query_exec_state_map_lock_ while it waits to get the QueryExecState's lock:

      shared_ptr<ImpalaServer::QueryExecState> ImpalaServer::GetQueryExecState(
          const TUniqueId& query_id, bool lock) {
        lock_guard<mutex> l(query_exec_state_map_lock_);
        QueryExecStateMap::iterator i = query_exec_state_map_.find(query_id);
        if (i == query_exec_state_map_.end()) {
          return shared_ptr<QueryExecState>();
        } else {
          if (lock) i->second->lock()->lock();
          return i->second;
        }
      }
      

      So until planning is finished, no query can get query_exec_state_map_lock_, which it needs to execute.

      What can we do?

      In the short term, maybe we can add TryGetQueryExecState() which will indicate if the query exists but its lock can't be taken.

      Or we might be able to let go of query_exec_state_map_lock_ as soon as we find the entry, and before taking its lock:

      shared_ptr<ImpalaServer::QueryExecState> ImpalaServer::GetQueryExecState(
          const TUniqueId& query_id, bool lock) {
        shared_ptr<QueryExecState> ret;
        {
          lock_guard<mutex> l(query_exec_state_map_lock_);
          QueryExecStateMap::iterator i = query_exec_state_map_.find(query_id);
          if (i == query_exec_state_map_.end()) {
            return shared_ptr<QueryExecState>();
          } else {
            ret = i->second;
          }
        } // give up query_exec_state_map_lock_
      
        if (lock) ret->lock()->lock();
        return ret;
      }
      

        Issue Links

          Activity

          Hide
          bharathv bharath v added a comment -

          Balazs Jeszenszky This only fixes the web server locking problem by relaxing the locking conditions. IMPALA-3882 is part of a bigger clean up and has a wider scope (for ex: points 1 & 3 are not fixed yet). So I think we should not close it.

          Show
          bharathv bharath v added a comment - Balazs Jeszenszky This only fixes the web server locking problem by relaxing the locking conditions. IMPALA-3882 is part of a bigger clean up and has a wider scope (for ex: points 1 & 3 are not fixed yet). So I think we should not close it.
          Hide
          jeszyb Balazs Jeszenszky added a comment -

          bharath v can you confirm this fixes IMPALA-3882 as well? If so, please resolve that one too, thanks!

          Show
          jeszyb Balazs Jeszenszky added a comment - bharath v can you confirm this fixes IMPALA-3882 as well? If so, please resolve that one too, thanks!
          Hide
          bharathv bharath v added a comment -

          Author: Bharath Vissapragada <bharathv@cloudera.com>
          Date: 2017-05-23 (Tue, 23 May 2017)

          Changed paths:
          M be/src/service/impala-beeswax-server.cc
          M be/src/service/impala-hs2-server.cc
          M be/src/service/impala-http-handler.cc
          M be/src/service/impala-server.cc
          M be/src/service/impala-server.h
          A tests/custom_cluster/test_query_concurrency.py
          M www/query_plan.tmpl

          Log Message:
          -----------
          IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

          Holding client_request_state_map_lock_ and CRS::lock_ together in certain
          paths could potentially block the impalad from registering new queries.
          The most common occurrence of this is while loading the webpage of a
          query while the query planning is still in progress. Since we hold the
          CRS::lock_ during planning, it blocks the web page from loading which
          inturn blocks incoming queries by holding client_request_state_map_lock_.

          This patch makes client_request_state_map_lock_ a terminal lock so that
          we don't have interleaving locking with CRS::lock_.

          Testing: Tested it locally by adding a long sleep in
          JniFrontend.createExecRequest() and still was able to refresh the web UI
          and run parallel queries. Also added a custom cluster test that does the
          same sequence of actions by injecting a metadata loading pause.

          Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
          Reviewed-on: http://gerrit.cloudera.org:8080/6707
          Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          bharathv bharath v added a comment - Author: Bharath Vissapragada <bharathv@cloudera.com> Date: 2017-05-23 (Tue, 23 May 2017) Changed paths: M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_query_concurrency.py M www/query_plan.tmpl Log Message: ----------- IMPALA-1972 / IMPALA-3882 : Fix client_request_state_map_lock_ contention Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Reviewed-on: http://gerrit.cloudera.org:8080/6707 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins
          Hide
          bharathv bharath v added a comment -

          Dan Hecht I've sent a patch for review, if you are interested. https://gerrit.cloudera.org/#/c/6707/

          Show
          bharathv bharath v added a comment - Dan Hecht I've sent a patch for review, if you are interested. https://gerrit.cloudera.org/#/c/6707/
          Hide
          henryr Henry Robinson added a comment -

          The health check path from CM doesn't hold onto the map lock while getting the exec state lock. So while the checks can get backed up in this scenario, they don't make it worse by increasing contention for the map lock (they do increase contention for the exec state lock).

          Show
          henryr Henry Robinson added a comment - The health check path from CM doesn't hold onto the map lock while getting the exec state lock. So while the checks can get backed up in this scenario, they don't make it worse by increasing contention for the map lock (they do increase contention for the exec state lock).
          Hide
          aivanov_impala_e71b Antoni added a comment -

          Hi,

          Could Cloudera Manager health checks exacerbate that problem ? Cloudera Impala Daemon Query Monitoring Status Check (http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ht_impala_daemon.html#concept_vvx_rhn_yk) does try to get query details, right ?

          Show
          aivanov_impala_e71b Antoni added a comment - Hi, Could Cloudera Manager health checks exacerbate that problem ? Cloudera Impala Daemon Query Monitoring Status Check ( http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ht_impala_daemon.html#concept_vvx_rhn_yk ) does try to get query details, right ?

            People

            • Assignee:
              bharathv bharath v
              Reporter:
              henryr Henry Robinson
            • Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development