After running large scale concurrent tests it was obvious that a dedicated coordinator.
During the concurrency tests resource utilization in terms of CPU, Memory and Network were significantly higher than other worker nodes.
The increase in resource utilization was mostly due to
- Sending exec plan fragment to remote fragments, ExecPlanFragment can become very large for complex queries with lots of files & partitions
- Runtime filters consume lots of untracked memory when the bloom filters are aggregated
- Runtime filters consume lots of Network throughput on the coordinator as it receives, aggregates and sends the filters
- Aggregating RunProfile counters
- When CPU utilization on the coordinator is high remote fragments occasionally fail with "Couldn't get a client for impala-8-2.vpc.cloudera.com:22000 Reason: Couldn't open transport for impala-compete-8-2.vpc.cloudera.com:22000 (connect() failed: Connection timed out)"
- Slow start of remote query fragments specially when the coordinator node is under heavy load
- Admission control counters can become out of sync when using multiple coordinators
Metadata should naturally benefit from running a dedicated coordinator as metadata is less likely to become out of sync and one day we can get rid of expensive catalog updates.
The dedicated coordinator should only run the and not participate in any scans, aggs etc..
This work will require CM support.
Running a dedicated coordinator significantly improved performance of complex TPC-DS queries
|Query||Baseline (seconds)||Dedicated coord (seconds)||Speedup %|