Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3807

Introduce support for dedicated Impalad coordinator(s)

    Details

      Description

      After running large scale concurrent tests it was obvious that a dedicated coordinator.
      During the concurrency tests resource utilization in terms of CPU, Memory and Network were significantly higher than other worker nodes.

      The increase in resource utilization was mostly due to

      • Sending exec plan fragment to remote fragments, ExecPlanFragment can become very large for complex queries with lots of files & partitions
      • Runtime filters consume lots of untracked memory when the bloom filters are aggregated
      • Runtime filters consume lots of Network throughput on the coordinator as it receives, aggregates and sends the filters
      • Aggregating RunProfile counters
      • When CPU utilization on the coordinator is high remote fragments occasionally fail with "Couldn't get a client for impala-8-2.vpc.cloudera.com:22000 Reason: Couldn't open transport for impala-compete-8-2.vpc.cloudera.com:22000 (connect() failed: Connection timed out)"
      • Slow start of remote query fragments specially when the coordinator node is under heavy load
      • Admission control counters can become out of sync when using multiple coordinators

      Metadata should naturally benefit from running a dedicated coordinator as metadata is less likely to become out of sync and one day we can get rid of expensive catalog updates.

      The dedicated coordinator should only run the Coordinator Fragment and not participate in any scans, aggs etc..

      This work will require CM support.

      Running a dedicated coordinator significantly improved performance of complex TPC-DS queries

      Query Baseline (seconds) Dedicated coord (seconds) Speedup %
      74 73 43 70%
      75 152 91 67%
      4 127 78 63%
      25 53 34 56%
      95 35 23 52%
      11 82 54 52%
      85 37 25 48%
      17 71 48 48%
      64 147 101 46%
      29 65 47 38%
      72 408 312 31%
      18 36 28 29%
      44 36 28 29%
      60 6 5 20%
      77 12 10 20%
      50 43 36 19%
      5 60 51 18%
      81 14 12 17%
      23 198 170 16%
      80 29 25 16%
      35 22 19 16%
      68 15 13 15%
      78 551 486 13%
      31 36 32 13%

        Activity

        Hide
        henryr Henry Robinson added a comment -

        I think this makes a lot of sense, but doesn't quite go far enough. A single coordinator can't reasonably serve a several hundred node cluster, and has poor fault-tolerance properties. Instead, my suggestion is to distinguish two kinds of Impala daemons:

        • Coordinators accept client connections, plan and distribute queries, manage the query lifecycle, aggregate filters and runtime profiles, and maybe stream results back to the client. They should not execute even the coordinator fragment.
        • Executors only execute plan fragments. They are, in a sense, stateless. As a general rule, it's much easier to scale stateless services.

        There should obviously be far fewer coordinators than executors. If done right, only the coordinators need to receive statestore updates (and eventually, the functionality of the statestore can be assumed by the coordinator nodes themselves). As you say, it's far easier to stop metadata from getting stale if it's only synchronized between a small cluster of coordinators, rather than the entire cluster.

        There are many advantages to this architecture beyond performance (and some potential drawbacks that would need to be measured as well). I think it's definitely the right direction.

        Show
        henryr Henry Robinson added a comment - I think this makes a lot of sense, but doesn't quite go far enough. A single coordinator can't reasonably serve a several hundred node cluster, and has poor fault-tolerance properties. Instead, my suggestion is to distinguish two kinds of Impala daemons: Coordinators accept client connections, plan and distribute queries, manage the query lifecycle, aggregate filters and runtime profiles, and maybe stream results back to the client. They should not execute even the coordinator fragment. Executors only execute plan fragments. They are, in a sense, stateless. As a general rule, it's much easier to scale stateless services. There should obviously be far fewer coordinators than executors. If done right, only the coordinators need to receive statestore updates (and eventually, the functionality of the statestore can be assumed by the coordinator nodes themselves). As you say, it's far easier to stop metadata from getting stale if it's only synchronized between a small cluster of coordinators, rather than the entire cluster. There are many advantages to this architecture beyond performance (and some potential drawbacks that would need to be measured as well). I think it's definitely the right direction.
        Hide
        alex.behm Alexander Behm added a comment -

        Dimitris Tsirogiannis is this one fixed?

        Show
        alex.behm Alexander Behm added a comment - Dimitris Tsirogiannis is this one fixed?
        Hide
        dtsirogiannis Dimitris Tsirogiannis added a comment -

        Alexander Behm, yes. Resolved it.

        Show
        dtsirogiannis Dimitris Tsirogiannis added a comment - Alexander Behm , yes. Resolved it.
        Hide
        feng_xiao_yu fengYu added a comment -

        Is there any way to use the feature without CM? I suppose adding a configuration to impalad which specify the role : Coordinators ? Executors or all.
        In this way, we can use the feature easier.

        Show
        feng_xiao_yu fengYu added a comment - Is there any way to use the feature without CM? I suppose adding a configuration to impalad which specify the role : Coordinators ? Executors or all. In this way, we can use the feature easier.
        Hide
        henryr Henry Robinson added a comment -

        fengYu - you can set -is_executor and -is_coordinator for each Impala daemon. Is that what you were hoping for, or something else?

        Show
        henryr Henry Robinson added a comment - fengYu - you can set - is_executor and -is_coordinator for each Impala daemon. Is that what you were hoping for, or something else?
        Hide
        feng_xiao_yu fengYu added a comment -

        Henry Robinson Yeah, I think those configuration is what I need, I will try it out, Thanks a lot.

        Show
        feng_xiao_yu fengYu added a comment - Henry Robinson Yeah, I think those configuration is what I need, I will try it out, Thanks a lot.

          People

          • Assignee:
            Unassigned
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            1 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development