[YARN-7327] CapacityScheduler: Allocate containers asynchronously by default - ASF JIRA

Details

Type: Improvement
Status: Open
Priority: Trivial
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

I was recently doing some research into Spark on YARN's startup time and observed slow, synchronous allocation of containers/executors. I am testing on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was only allocating about 3 containers per second. Moreover when starting 3 Spark applications at the same time with each requesting 44 containers, the first application would get all 44 requested containers and then the next application would start getting containers and so on.

From looking at the code, it appears this is by design. There is an undocumented configuration variable that will enable asynchronous allocation of containers. I'm sure I'm missing something, but why is this not the default? Is there a bug or race condition in this code path? I've done some testing with it and it's been working and is significantly faster.

Here's the config:
`yarn.scheduler.capacity.schedule-asynchronously.enable`

Any help understanding this would be appreciated.

Thanks,
Craig

If you're curious about the performance difference with this setting, here are the results:

The following tool was used for the benchmarks:
https://github.com/SparkTC/spark-bench

async scheduler research

The goal of this test is to determine if running Spark on YARN with async scheduling of containers reduces the amount of time required for an application to receive all of its requested resources. This setting should also reduce the overall runtime of short-lived applications/stages or notebook paragraphs. This setting could prove crucial to achieving optimal performance when sharing resources on a cluster with dynalloc enabled.

Test Setup

Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) between runs.
`yarn.scheduler.capacity.schedule-asynchronously.enable=true|false`

conf files request executors counts of:

2
20
50
100

The apps are being submitted to the default queue on each cluster which caps at 48 cores on dynalloc and 72 cores on baremetal. The default queue was expanded for the last two tests on baremetal so it could potentially take advantage of all 144 cores.

Test Environments

dynalloc

4 VMs in Fyre (1 master, 3 workers)
8 CPUs/16 GB per node
model name : QEMU Virtual CPU version 2.5+

baremetal

4 baremetal instances in Fyre (1 master, 3 workers)
48 CPUs/128GB per node
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Using spark-bench with timedsleep workload sync

dynalloc

requested containers	avg	stdev
2	23.814900	1.110725
20	29.770250	0.830528
50	44.486600	0.593516
100	44.337700	0.490139

baremetal - 2 queues splitting cluster 72 cores each

requested containers	avg	stdev
2	14.827000	0.292290
20	19.613150	0.155421
50	30.768400	0.083400
100	40.931850	0.092160

baremetal - 1 queue to rule them all - 144 cores

requested containers	avg	stdev
2	14.833050	0.334061
20	19.575000	0.212836
50	30.765350	0.111035
100	41.763300	0.182700

Using spark-bench with timedsleep workload async

dynalloc

requested containers	avg	stdev
2	22.575150	0.574296
20	26.904150	1.244602
50	44.721800	0.655388
100	44.570000	0.514540

2nd run

requested containers	avg	stdev
2	22.441200	0.715875
20	26.683400	0.583762
50	44.227250	0.512568
100	44.238750	0.329712

baremetal - 2 queues splitting cluster 72 cores each

requested containers	avg	stdev
2	12.902350	0.125505
20	13.830600	0.169598
50	16.738050	0.265091
100	40.654500	0.111417

baremetal - 1 queue to rule them all - 144 cores

requested containers	avg	stdev
2	12.987150	0.118169
20	13.837150	0.145871
50	16.816300	0.253437
100	23.113450	0.320744

CapacityScheduler: Allocate containers asynchronously by default

Details

Description

async scheduler research

Test Setup

Test Environments

dynalloc

baremetal

Using spark-bench with timedsleep workload sync

dynalloc

baremetal - 2 queues splitting cluster 72 cores each

baremetal - 1 queue to rule them all - 144 cores

Using spark-bench with timedsleep workload async

dynalloc

2nd run

baremetal - 2 queues splitting cluster 72 cores each

baremetal - 1 queue to rule them all - 144 cores

Attachments

Attachments

Activity

People

Dates