Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7327

CapacityScheduler: Allocate containers asynchronously by default

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I was recently doing some research into Spark on YARN's startup time and observed slow, synchronous allocation of containers/executors. I am testing on a 4 node bare metal cluster w/48 cores and 128GB memory per node. YARN was only allocating about 3 containers per second. Moreover when starting 3 Spark applications at the same time with each requesting 44 containers, the first application would get all 44 requested containers and then the next application would start getting containers and so on.

      From looking at the code, it appears this is by design. There is an undocumented configuration variable that will enable asynchronous allocation of containers. I'm sure I'm missing something, but why is this not the default? Is there a bug or race condition in this code path? I've done some testing with it and it's been working and is significantly faster.

      Here's the config:
      `yarn.scheduler.capacity.schedule-asynchronously.enable`

      Any help understanding this would be appreciated.

      Thanks,
      Craig

      If you're curious about the performance difference with this setting, here are the results:

      The following tool was used for the benchmarks:
      https://github.com/SparkTC/spark-bench

      async scheduler research

      The goal of this test is to determine if running Spark on YARN with async scheduling of containers reduces the amount of time required for an application to receive all of its requested resources. This setting should also reduce the overall runtime of short-lived applications/stages or notebook paragraphs. This setting could prove crucial to achieving optimal performance when sharing resources on a cluster with dynalloc enabled.

      Test Setup

      Must update /etc/hadoop/conf/capacity-scheduler.xml (or through Ambari) between runs.
      `yarn.scheduler.capacity.schedule-asynchronously.enable=true|false`

      conf files request executors counts of:

      • 2
      • 20
      • 50
      • 100

      The apps are being submitted to the default queue on each cluster which caps at 48 cores on dynalloc and 72 cores on baremetal. The default queue was expanded for the last two tests on baremetal so it could potentially take advantage of all 144 cores.

      Test Environments

      dynalloc

      4 VMs in Fyre (1 master, 3 workers)
      8 CPUs/16 GB per node
      model name : QEMU Virtual CPU version 2.5+

      baremetal

      4 baremetal instances in Fyre (1 master, 3 workers)
      48 CPUs/128GB per node
      model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

      Using spark-bench with timedsleep workload sync

      dynalloc

      requested containers avg stdev
      2 23.814900 1.110725
      20 29.770250 0.830528
      50 44.486600 0.593516
      100 44.337700 0.490139

      baremetal - 2 queues splitting cluster 72 cores each

      requested containers avg stdev
      2 14.827000 0.292290
      20 19.613150 0.155421
      50 30.768400 0.083400
      100 40.931850 0.092160

      baremetal - 1 queue to rule them all - 144 cores

      requested containers avg stdev
      2 14.833050 0.334061
      20 19.575000 0.212836
      50 30.765350 0.111035
      100 41.763300 0.182700

      Using spark-bench with timedsleep workload async

      dynalloc

      requested containers avg stdev
      2 22.575150 0.574296
      20 26.904150 1.244602
      50 44.721800 0.655388
      100 44.570000 0.514540
      2nd run
      requested containers avg stdev
      2 22.441200 0.715875
      20 26.683400 0.583762
      50 44.227250 0.512568
      100 44.238750 0.329712

      baremetal - 2 queues splitting cluster 72 cores each

      requested containers avg stdev
      2 12.902350 0.125505
      20 13.830600 0.169598
      50 16.738050 0.265091
      100 40.654500 0.111417

      baremetal - 1 queue to rule them all - 144 cores

      requested containers avg stdev
      2 12.987150 0.118169
      20 13.837150 0.145871
      50 16.816300 0.253437
      100 23.113450 0.320744

      Attachments

        1. async-scheduling-results.md
          3 kB
          Craig Ingram
        2. schedule-async.png
          41 kB
          Craig Ingram
        3. spark-on-yarn-schedule-async.ipynb
          58 kB
          Craig Ingram
        4. yarn-async-scheduling.png
          19 kB
          Craig Ingram

        Activity

          People

            Unassigned Unassigned
            CraigI Craig Ingram
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: