Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21383

YARN can allocate too many executors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.2.1, 2.3.0
    • Spark Core, YARN
    • None

    Description

      The YarnAllocator doesn't properly track containers being launched but not yet running. If it takes time to launch the containers on the NM they don't show up as numExecutorsRunning, but they are already out of the Pending list, so if the allocateResources call happens again it can think it has missing executors even when it doesn't (they just haven't been launched yet).

      This was introduced by SPARK-12447

      Where it check for missing:
      https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L297

      Only updates the numRunningExecutors after NM has started it:
      https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L524

      Thus if the NM is slow or the network is slow, it can miscount and start additional executors.

      Attachments

        Issue Links

          Activity

            People

              DjvuLee DjvuLee
              tgraves Thomas Graves
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: